DApp Storage – How to Store Data Other Than Transactions in a DApp

contract-designcontract-developmentdapp-developmentdappsstorage

For the purpose of illustration, say I would like to create a decentralized version of Yelp.

A centralized approach would be to have a restaurants table and a reviews table in an Mysql database. In the case of a DApp, what is the best practice for storing non-transactional data?

My current thinking to have a Restaurant.sol contract and a Review.sol contract, each having a mapping from the record ID as key, to the record object as value. Every time a restaurant is added, we invoke a addRecord() method in Restaurant, which adds the new restaurant data to its current mapping. (Similar flow for Review)

Is this approach of treating a contract like a RDBMS table robust? Or am I missing something?

What's the limit to the amount of data that can be stored in the mapping of a contract?

EDIT: from my understanding changing the mapping of the Restaurant contract would cost ether. So that means every new record that is added to the "database" costs money? What's an economically viable way to run an app without requiring users to pay?

Please advise.

Best Answer

First, you don't want to think of contracts as analogous to tables. Each contract can hold multiple mappings of information. You could have a single ReviewSystem.sol that has mappings for both Restaurants and Reviews, and in line with what you suggested earlier you could have addRestaurant() and addReview() methods that would add records to the mappings stored on the contract.

That said, contracts don't have very good mechanisms for normalized relational data. In a relational database, if you had a "Restaurant" record you could run a query for all reviews relating to a certain restaurant, and with the power of indexes and query planners get that information back very quickly. In Ethereum your options are more limited. If you don't have an index you'll have to scan over every review in your system to see if it relates to the restaurant in question. If you do have an index, you increase the costs of writing records to your contract. You might also denormalize the data for faster lookups, again at a cost of increased contract storage.

One option is to keep some of your information off-chain. For example you might keep your restaurants and reviews on your contract with a pointer from Review to Restaurant. Separately you might have a traditional database that indexes the relationship from Reviews to Restaurants. When someone adds a review via your DApp the contract stores the canonical information, and the separate database keeps a copy of the information. When someone wants to look up the reviews relating to a certain restaurant, they might query an API backed by the database which can quickly return the IDs for the reviews, and they can get the reviews themselves from the blockchain. In this case, all of the critical information is available on-chain, and the off-chain index can be reconstructed from the contract state. To learn about tracking contract events off chain, read about Events and Logs.

As you noted in your edit, every piece of information you save to a contract costs gas, and that gas in turn costs Ether. Running a service like Yelp with hundreds of thousands of businesses and thousand word reviews could prove very expensive. You could mitigate some of the costs by storing more information off-chain. Perhaps on-chain a review consists of a star rating and the hash of the written review, then the review text is retrieved from an off-chain source such as the database we discussed earlier, or perhaps another decentralized system like ipfs.

In general though, I see the Ethereum block chain as the place where you store information that you don't want to trust third parties to manage. Things like ERC20 tokens, ENS, and distributed exchanges are a great use case because it would otherwise be difficult to establish trust in a centralized entity. While it would be neat to have something like a decentralized review service, the cost of contract storage combined with the relatively low risk of trusting a third party to manage that information make it a less attractive use case for the blockchain.

Related Topic