Blockchain Storage – How Blocks and Tries are Stored in Merkle Patricia Tries


I still not sure if I understood how some structures of Ethereum are physically stored (assuming Geth implementation)

  • State Trie: only one off-chain Merkle Patricia Tries stored using LevelDB;

  • Storage Trie: one Merkle Patricia Trie per account; stored off-chain together with the State Trie using LevelDB;

  • Transaction Tries: not really physically stored; a Merkle Patricia Trie is created on the fly when needed using the block transaction list;

  • Receipts Trie: not a clue;

  • Blocks: State trie and Storage tries are stored in .ldb files (LevelDB), but where can I find the block files and which format are they stored?

Low level geth database format is:

var databaseVerisionKey = new Buffer("DatabaseVersion"); // databaseVerisionKey tracks the current database version.
var headHeaderKey = new Buffer("LastHeader"); // headHeaderKey tracks the latest know header's hash.
var headBlockKey = new Buffer("LastBlock"); // headBlockKey tracks the latest know full block's hash.
var headFastBlockKey = new Buffer("LastFast"); // headFastBlockKey tracks the latest known incomplete block's hash duirng fast sync.
var fastTrieProgressKey = new Buffer("TrieSync"); // fastTrieProgressKey tracks the number of trie entries imported during fast sync.

// Data item prefixes (use single byte to avoid mixing data types, avoid `i`, used for indexes).
var headerPrefix = new Buffer("h"); // headerPrefix + num (uint64 big endian) + hash -> header
var headerTDSuffix = new Buffer("t"); // headerPrefix + num (uint64 big endian) + hash + headerTDSuffix -> td
var headerHashSuffix = new Buffer("n"); // headerPrefix + num (uint64 big endian) + headerHashSuffix -> hash
var headerNumberPrefix = new Buffer("H"); // headerNumberPrefix + hash -> num (uint64 big endian)
var blockBodyPrefix = new Buffer("b"); // blockBodyPrefix + num (uint64 big endian) + hash -> block body
var blockReceiptsPrefix = new Buffer("r"); // blockReceiptsPrefix + num (uint64 big endian) + hash -> block receipts
var txLookupPrefix = new Buffer("l"); // txLookupPrefix + hash -> transaction/receipt lookup metadata
var bloomBitsPrefix = new Buffer("B"); // bloomBitsPrefix + bit (uint16 big endian) + section (uint64 big endian) + hash -> bloom bits
var preimagePrefix = new Buffer("secure-key-");      // preimagePrefix + hash -> preimage
var configPrefix = new Buffer("ethereum-config-"); // config prefix for the db
var BloomBitsIndexPrefix = new Buffer("iB"); // BloomBitsIndexPrefix is the data table of a chain indexer to track its progress // Chain index prefixes (use `i` + single byte to avoid mixing data types).

To get data you have to recursivily build tree`s from this data. Knowing hash of state root you can find state root, and then you know hashes of children of state root, so you know children so you can get up to leafs.

Depending on geth option --gcmode archive|fast|light (you can also specifie how many blocks you want to remember), geth stores or doesn`t some tries.

Diffrent tries are world state tree (links to accounts), storage tries (account data), and receipt tries (for transaction receipts).

To get value "sample value" from tree (for example contract adress). You need to go 32 length way down the tree depending on 32 chars length sha3("sample value").

To understand better which data are stored in db and how tries are made look at these to pictures:

