DApp Storage – How to Store Data Other Than Transactions in a DApp

contract-designcontract-developmentdapp-developmentdappsstorage

For the purpose of illustration, say I would like to create a decentralized version of Yelp.

A centralized approach would be to have a restaurants table and a reviews table in an Mysql database. In the case of a DApp, what is the best practice for storing non-transactional data?

My current thinking to have a Restaurant.sol contract and a Review.sol contract, each having a mapping from the record ID as key, to the record object as value. Every time a restaurant is added, we invoke a addRecord() method in Restaurant, which adds the new restaurant data to its current mapping. (Similar flow for Review)

Is this approach of treating a contract like a RDBMS table robust? Or am I missing something?

What's the limit to the amount of data that can be stored in the mapping of a contract?

EDIT: from my understanding changing the mapping of the Restaurant contract would cost ether. So that means every new record that is added to the "database" costs money? What's an economically viable way to run an app without requiring users to pay?

Please advise.

Best Answer

First, you don't want to think of contracts as analogous to tables. Each contract can hold multiple mappings of information. You could have a single ReviewSystem.sol that has mappings for both Restaurants and Reviews, and in line with what you suggested earlier you could have addRestaurant() and addReview() methods that would add records to the mappings stored on the contract.

That said, contracts don't have very good mechanisms for normalized relational data. In a relational database, if you had a "Restaurant" record you could run a query for all reviews relating to a certain restaurant, and with the power of indexes and query planners get that information back very quickly. In Ethereum your options are more limited. If you don't have an index you'll have to scan over every review in your system to see if it relates to the restaurant in question. If you do have an index, you increase the costs of writing records to your contract. You might also denormalize the data for faster lookups, again at a cost of increased contract storage.

One option is to keep some of your information off-chain. For example you might keep your restaurants and reviews on your contract with a pointer from Review to Restaurant. Separately you might have a traditional database that indexes the relationship from Reviews to Restaurants. When someone adds a review via your DApp the contract stores the canonical information, and the separate database keeps a copy of the information. When someone wants to look up the reviews relating to a certain restaurant, they might query an API backed by the database which can quickly return the IDs for the reviews, and they can get the reviews themselves from the blockchain. In this case, all of the critical information is available on-chain, and the off-chain index can be reconstructed from the contract state. To learn about tracking contract events off chain, read about Events and Logs.

As you noted in your edit, every piece of information you save to a contract costs gas, and that gas in turn costs Ether. Running a service like Yelp with hundreds of thousands of businesses and thousand word reviews could prove very expensive. You could mitigate some of the costs by storing more information off-chain. Perhaps on-chain a review consists of a star rating and the hash of the written review, then the review text is retrieved from an off-chain source such as the database we discussed earlier, or perhaps another decentralized system like ipfs.

In general though, I see the Ethereum block chain as the place where you store information that you don't want to trust third parties to manage. Things like ERC20 tokens, ENS, and distributed exchanges are a great use case because it would otherwise be difficult to establish trust in a centralized entity. While it would be neat to have something like a decentralized review service, the cost of contract storage combined with the relatively low risk of trusting a third party to manage that information make it a less attractive use case for the blockchain.

Related Solutions

[Ethereum] the best practice to store and retreive large data in solidity smart contracts

event is a keyword in solidity that allows you to easily retrieve data that a contract generated in a Javascript frontend. It is cheaper in terms of gas to write an event than to write into a storage variable (like your struct array). You can then .watch for these events and pull them into your UI. This is the recommended way to go. For performance reasons you might want to cache these event data somehow because parsing through all blocks on every reload is fairly inefficient.

Also, do not try to fit all your data into your smart contract. The job of a well-designed Dapp is to contain nothing more than the absolutely required elements and pointers to external structures but keep your storage-layer separated from the blockchain. A storage layer could be implemented via IPFS or Swarm.

DApp Development – How to Update DApp and Retain Existing Data

A DApp's only data sources are typically smart contract code and blockchain data and no (centralized) databases or sources. The entire flow happens entirely between the client and the blockchain. Upgrading your Dapp (front-end) doesn't necessarily mean you have to update your data (smart contracts) though. Depending on where the DApp is hosted, you can redeploy it and still refer to the same smart contracts. Smart contracts on the other hand are, once deployed, immutable. There is no way to update or delete any smart contract or transaction that is deployed to the network.

There are patterns that can help you to separate your data from business logic, or help upgrade (replace) your smart contracts. You need to design for this upfront before you deploy your first contract.

Eternal Storage

A common pattern is by using Eternal Storage. This is a smart contract that only contains data, sort of a key/value store, and no other business logic. This is similar to a database in a more traditional web application.

A basic example

contract EternalStorage {

    mapping(bytes32 => uint) uIntStorage;

    function getUint(bytes32 _key) external view returns(uint) {
        return uIntStorage[_key];
    }

    function setUint(bytes32 _key, uint _value) external {
        uIntStorage[_key] = _value;
    }

    function deleteUint(bytes32 _key) external {
        delete uIntStorage[_key];
    }
}

Separating your storage contract from your business logic allows you to keep your data, while still allowing the flexibility to change your logic.

More details and sample code

Proxy Contracts

Another common way is to use a proxy architecture pattern, that allows you to use new deployed contract as if your main logic had been upgraded. All message calls go through a Proxy contract that will redirect them to the latest deployed contract logic. To upgrade, a new version of your contract is deployed, and the Proxy is updated to reference the new contract address.