[Ethereum] Exploring and Analyzing the Ethereum blockchain


I would like to explore the data inside of the Ethereum blockchain. All transactions and contracts of the Ethereum Blockchain are stored there. So it's an interesting open dataset and I'd like to get some insights out of it. But I am not entirely sure how can I access this data. Inside ~/.ethereum/chaindata (Linux) I can see the files containing the blockchain but unlike the keys they cannot be explored easily. What kind of encoding does it have? What are the best technical tools (like databases) to work with it (in a non-interactive way)?

I don't really think using a website to explore certain transactions or contract code like what you can do on https://www.etherchain.org/ is the best solution. But instead I want to explore the (whole) data first and do a classical exploratory data analysis, so that later I can focus on certain parts I want to explore.

Suggestions about how to work with this data, about its format and how to manipulate it are welcome.

Best Answer

Data structures are stored in Merkle Patricia tries (read this and this), usually inside a LevelDB store. The chaindata is exactly that.

I think the structure might slightly depend on the actual node implementation?

Here are two of the trie implementations, could be a good starting point:

Once that is understood, you will need to look up the data structures sent over the network, especially blocks, transactions, accounts and storage. Accounts refer to both externally owned and contract accounts, the latter with contract code. Storage refers to contract storage.

Back to the point of data analysis

You might be better off building a traditional NoSQL or SQL database if you want to have quick lookups about any transactions done by an account or contract. What the chaindata is optimised for is validating transactions and maintaining a valid state, and not so for historical lookups.

Related Topic