[Ethereum] How does data storage on the blockchain work


If I store file chunks in the blocks, I'm causing duplicated blocks all over the networks (I mean, my file's chunks will be in each computer on the blockchain network, So eventually I have many many copies of this file in each block of each computer). On the other hand, If I store reference to the file in the block(for example, the block will hold: Myfile.txt stores in IP: and each computer that has my file, can change it easily. and even If i'm checking whether or (my example above) changed my file, and the answer is "True" (by comparing file's hash before and after storage) , I still don't get my file, and the data storage on blockchain is pointless.

So, how does companies like Storj works? how do they store the data on the blockchain? (As I understood, storj store references on the blocks, to 3 computers that holds my file, so how do they avoid my problem which I mentioned above?)

Thanks in advance.

Best Answer

Short answer is don't store chunks of files in the blockchain. It's not well-suited for this purpose.

Better answer. It helps to consider separate concerns:

  1. Smart Contracts. Minimalist data that must be true for all users at all times. Focus is on fidelity, as it's extremely difficult to post false data into a contract that's careful about updates.
  2. Distributed Storage of large objects. Swarm, Storj and IPFS are examples of distributed object storage/distributed file system. They use various strategies for smearing out copies and shards across participating nodes, but they are not putting all things in all places at all times.

At a very high level, we can consider a combination of these approaches. A distributed storage system can provide resilient storage of a blob, media, or JSON object with a lot of data, known by it's file name or path; a unique identifer.

A Smart Contract can track the identifier of the object that is valid, now. It can hold a validation hash useful for confirming that the correct data has been loaded from the "other" source.

For example, one might have a contract that says "There is a movie called HomeMovie" located at "url ..." and its hash is "0x456" and this is the "valid" object as of this time. If an authorized user changes the file (version), they would update the Smart Contract; "Now, the movie we're using is HomeMovie2 located at "new url ..." and its hash is "0x567".

This keeps the Ethereum storage to a minimum, while providing a way to confirm no corruption of the data in the overall system. There is an extensive assortment of projects working toward a unified view of things; that is, one interface that does it all.

Hope it helps.

Update: See Blog: Simple Storage Patterns in Solidity

Related Topic