Ethereum – Understanding Block Structure, Roots, and Tries Concept

blocksstate-trie

Trying to reinforce my understanding, so hoping you guys may help me out.

From reading the Ethereum wiki, I believe part of an Ethereum block header consist of the following:

stateRoot
transactionRoot
receiptsRoot

Which are the keccak hashes of the root node of the tries themselves.

My questions would be as follows……

Q1 – Where does geth stores these tries – I reckon within the block themselves, i.e. chaindata folder, within the ldb files? Or am I entirely off-base?

Q2 – (I am assuming the state trie is indeed kept in each block) Does the state trie of block N references the state trie of block N-1 (if no changes to an account's state is found, i.e. only log accounts with differences), or is the state trie duplicated across blocks?

Q3 – While state tree pruning, I reckon that the states are pruned off since if we have the stateRoot, this effectively verifies that the state trie nodes are OK, hence safe to discard the state trie themselves – Would this understanding be correct from high level perspective?

Q4 – What would the .\ethereum\nodes folder be for?

Thanks!

Best Answer

Firstly, you'll want to take a look at this picture from a previous question for reference.

Q1 - Where does geth stores these tries - I reckon within the block themselves, i.e. chaindata folder, within the ldb files? Or am I entirely off-base?

The chain data isn't actually part of the block proper - it's stored separately in a leveldb database. On your machine this is what's inside the chaindata folder. The block itself stores the hashes of the roots of the various tries, the state data (i.e. chain data) being one of these.

See:

Q2 - (I am assuming the state trie is indeed kept in each block) Does the state trie of block N references the state trie of block N-1 (if no changes to an account's state is found, i.e. only log accounts with differences), or is the state trie duplicated across blocks?

Your first assumption isn't correct, as per Q1, but yes, the state trie references backwards to prevent duplication. This picture from this previous answer helps visualise this:

Q3 - While state tree pruning, I reckon that the states are pruned off since if we have the stateRoot, this effectively verifies that the state trie nodes are OK, hence safe to discard the state trie themselves - Would this understanding be correct from high level perspective?

I'm not entirely sure, but this previous official blog post might help: https://blog.ethereum.org/2015/06/26/state-tree-pruning/

Q4 - What would the .\ethereum\nodes folder be for?

It's a database of nodes your node knows about. It's blobified in RLP format, so it's not readily readable. For further details, see Format of LevelDB files in nodes directory? Trouble pulling contents with python leveldb API

Related Solutions

Blockchain Storage – How Blocks and Tries are Stored in Merkle Patricia Tries

Low level geth database format is:

var databaseVerisionKey = new Buffer("DatabaseVersion"); // databaseVerisionKey tracks the current database version.
var headHeaderKey = new Buffer("LastHeader"); // headHeaderKey tracks the latest know header's hash.
var headBlockKey = new Buffer("LastBlock"); // headBlockKey tracks the latest know full block's hash.
var headFastBlockKey = new Buffer("LastFast"); // headFastBlockKey tracks the latest known incomplete block's hash duirng fast sync.
var fastTrieProgressKey = new Buffer("TrieSync"); // fastTrieProgressKey tracks the number of trie entries imported during fast sync.

// Data item prefixes (use single byte to avoid mixing data types, avoid `i`, used for indexes).
var headerPrefix = new Buffer("h"); // headerPrefix + num (uint64 big endian) + hash -> header
var headerTDSuffix = new Buffer("t"); // headerPrefix + num (uint64 big endian) + hash + headerTDSuffix -> td
var headerHashSuffix = new Buffer("n"); // headerPrefix + num (uint64 big endian) + headerHashSuffix -> hash
var headerNumberPrefix = new Buffer("H"); // headerNumberPrefix + hash -> num (uint64 big endian)
var blockBodyPrefix = new Buffer("b"); // blockBodyPrefix + num (uint64 big endian) + hash -> block body
var blockReceiptsPrefix = new Buffer("r"); // blockReceiptsPrefix + num (uint64 big endian) + hash -> block receipts
var txLookupPrefix = new Buffer("l"); // txLookupPrefix + hash -> transaction/receipt lookup metadata
var bloomBitsPrefix = new Buffer("B"); // bloomBitsPrefix + bit (uint16 big endian) + section (uint64 big endian) + hash -> bloom bits
var preimagePrefix = new Buffer("secure-key-");      // preimagePrefix + hash -> preimage
var configPrefix = new Buffer("ethereum-config-"); // config prefix for the db
var BloomBitsIndexPrefix = new Buffer("iB"); // BloomBitsIndexPrefix is the data table of a chain indexer to track its progress // Chain index prefixes (use `i` + single byte to avoid mixing data types).

To get data you have to recursivily build tree`s from this data. Knowing hash of state root you can find state root, and then you know hashes of children of state root, so you know children so you can get up to leafs.

Depending on geth option --gcmode archive|fast|light (you can also specifie how many blocks you want to remember), geth stores or doesn`t some tries.

Diffrent tries are world state tree (links to accounts), storage tries (account data), and receipt tries (for transaction receipts).

To get value "sample value" from tree (for example contract adress). You need to go 32 length way down the tree depending on 32 chars length sha3("sample value").

To understand better which data are stored in db and how tries are made look at these to pictures:

Best Answer

Related Solutions

Blockchain Storage – How Blocks and Tries are Stored in Merkle Patricia Tries

Related Topic