[Ethereum] If a blockchain is a distributed database, where is the data

blockchaindatabase

There are a lot of articles describing how Ethereum is just another database, and a number of posts here on this site that talk about storing data on the blockchain itself (e.g., [1], [2]). However, my question is a bit more basic than those. Specifically, if a blockchain is just a database, then:

  1. What kind of database is this (e.g., relational database, nosql, registry-style key/value pairs, plain text, etc)? What any older technology can I look to for an understanding of how it works? This Quora question has answers that suggest both relational-style (by mentioning foreign keys) and key/value pairs, figured I'd ask here and see if I can get more consensus.
  2. How do I "read" a value from the database? How can I be sure I'm getting the most recent version of that (e.g., no transactions have taken place that update the value of this data)?

Best Answer

Blockchains are databases from a high-level view, but the underlying technology is different enough to make some assumptions inaccurate. An Excel file is also a database, but with characteristics unneeded/harmful for say, a NoSQL database.

The data "is" on every full node. That is, every client replicates the entire blockchain. This is indeed inefficient, and future development is aimed towards removing this requirement.

  1. The official specification for the state (that is, the set of all contracts and accounts and their storage and balance) is a Merkle Patricia Tree. However, an individual node may implement it internally in whatever way it finds best. (geth uses levelDB for example.)
  2. Every full node has its own copy of the blockchain. Reading consists of querying one's own node for the data. There's no guarantee that you have the latest block--an inherent limitation of blockchains. There are several ways for a good dapp to come with this environment--the primary is simply not to assume that the local latest block is the actual latest.
Related Topic