[Ethereum] Difference between a pruned and unpruned blockchain

blockchainlight-clientsstate-trie-pruningsynchronization

At the Berlin Blockchain Meetup, Gustav Simonsson teased the Homestead release coming soon ™ and we were discussing blockchain bloat and the current size of the Ethereum blockchain.

Berlin Blockchain Meetup

We discussed the pruned and unpruned blockchain and the geth fast sync option. Now I wonder:

  1. What is the difference between a pruned and unpruned blockchain? Does the pruned blockchain still consist of blocks? Is it even a "block chain"?
  2. What is the difference in space requirements? The full, unpruned chain is currently 7GB in size. How much space does a pruned chain require?
  3. How about pruned fast-sync clients to light clients? If the space requirements are lower, isn't it better to use fast-sync clients rather than unsafe light clients?

Could pruned chain clients be considered as light clients?

Best Answer

Lets take it one step at a time.

Blockchains generally work by having an origin (genesis) state with a few accounts having funds, and then every block that you place on top of the chain moves those origin funds around, also granting a bit of extra for the miner. So whenever you import a new block into your existing chain to take a look at what your view (state) of the world is, and transform that state according to the transactions contained in the block, arriving to a new view of what you believe the world looks like. You don't discard your past view of the world because if there is a fork in the blockchain (e.g. a miner turns up with a better block, or maybe two better blocks), then you need to transform your view from that past state to the better version. This leads to all past states that you transitioned through being accumulated for eternity. This is an unpruned state/blockchain, and is at 7GB currently for Ethereum.

The important thing to notice is that most of the times you don't care about how much fund an account had 3 years ago, you only care about what the state currently is (maybe a few days ago too). So why keep all that extremely old past transition state around? State pruning is essentially taking all that intermediate state, and flushing it down the toilet. The important thing to realize is that you only throw away the intermediate world view, never the blocks themselves or any other data that might be unhealthy for the network (i.e. a joining node needs that data to sync). Thus by pruning your state trie, you lose the ability to query the past balance of accounts, but at benefit of reducing your amount of stored data to aout 1/5-1/6 of its original size.

Ok, so what about fast sync? Well, following the previous thought pattern, if you don't care about the balance of a random account from 3 years ago, why would you want to replay the entire transaction history of the blockchain, just to get to the current state. So what fast sync does, is that it downloads all the blockchain, but it does not execute the transactions generating the world view one block at a time. Instead it only verifies the proof-of-works, and when the entire chain is downloaded it looks at the state root (hash defining the current world view) and downloads the state database directly from within the network, reconstructing the final state from the start, without needing the transient states for it. This means, that beside downloading the blocks, it needs to download additional data, the state trie itself, so it's exchanging bandwidth for processing power (i.e. I download the state, don't generate it). The end result of fast sync is a pruned database from all intents and purposes, just via a different means. The current size of such a database is 1.2-1.3GB.

Related Topic