Several sources mention the idea of state-trie pruning. What is that, and is it currently being implemented on the network? Is this a manual process or can it be done automatically? If Ethereum is currently processing 1 GB per month under the current volume, and could grow with greater adoption, can state-trie pruning prevent the size of the ethereum blockchain from becoming unwieldy? Some sources seem concerned.
[Ethereum] state-trie pruning and how does it work
blockchainstate-trie-pruning
Related Solutions
geth and parity have differents methods to save the ethereum blockchain in their internal format. I made many benchs because i find it so long just to use a Wallet.
The pruning mode is how the block data are saved. With the archive mode, all states are saved. So, you know the state at each moment without to reload all the blockchain. With fast and light, we assume that we don't need all this information but only the current state and few before, so we remove many intermediates states.
On geth, the --fast method saves a state of the blockchain at block B[-1500] and all the states after this block (B[-1] is the last block, B[-2] is the before last block ...). So it is possible to rewind at the state of 1500 last blocks. With a full blockchain, you can do it for all blocks.
In parity, there are 4 pruning modes or journalDB algorithms:
- Archive (Archive): as geth Archive we keep all states
- Fast (OverlayRecent): as geth Fast we keep all the full states of the B[-i] blocks
- Light (EarlyMerge): we keep all the states of the B[-i] blocks but in differential (so it is smaller than fast but slower access)
- Basic (RefCounted): we keep all the states of the B[-i] blocks like with OverlayRecent but we remove states after x changes... so we have only the xth last changes
Benchmarks done on i7 3720QM 16GB ram with Geth 1.4.4 (Homestead with 1.6Mi blocks)
_____________________________________________
| Option | Disk Used | Time | Disk Written |
|--------|-----------|------|---------------|
| none | 19.GB | 5h00 | 1TB |
| fast | 3.7GB | 1h00 | 100GB |
---------------------------------------------
Benchmarks done on i7 3720QM 16GB ram with Geth 1.5.0 unstable (Homestead with 1.6Mi blocks found at https://gitter.im/ethereum/go-ethereum?at=574d26c010f0fed86f49b32f)
__________________________________________________
| command | Disk Used | Time | Disk Written |
|-------------|-----------|------|---------------|
| geth | 21GB | 5h00 | 150GB |
| geth --fast | 4.2GB | 21m | 35GB |
| geth export | 1.5GB | 10m | |
| geth import | 21GB | 3h30 | |
--------------------------------------------------
Benchmarks done on i7 3720QM 16GB ram with Parity 1.2 (Homestead with 1.6Mi blocks)
_____________________________________________
| Option | Disk Used | Time | Disk Written |
|--------|-----------|------|---------------|
| archive| 19.GB | 2h00 | 300GB |
| fast | 3.7GB | 1h30 | 20GB |
| light | 2.5GB | 2h00 | 130GB |
---------------------------------------------
Note: When you have a node with a blockchain, you can dump the chaindata of geth directory to use it with your other computers. I check it with Linux, Windows and OS X.
Note: if you use --cache with 1024, it could be faster. But it is not significant on my system. The same goes for the --jitvm
Note: the ethereum blockchain saved the final state after transactions but it is safer to replay the transactions to check them.
Transaction Tries and Transaction Receipt Tries are indeed independent data structures with distinct roots stored on the blockchain header and differ in both purpose and content.
Purpose:
Transaction Tries: records transaction request vectors
Transaction Receipt Tries: records the transaction outcome
Content:
Parameters used in composing a Transaction Trie [details in section 4.3 of the yellow paper]:
- nonce,
- gas price,
- gas limit,
- recipient,
- transfer value,
- transaction signature values, and
- account initialization (if transaction is of contract creation type), or transaction data (if transaction is a message call)
Parameters used in composing a Transaction Receipt Trie [details in section 4.4.1 of the yellow paper]:
- post-transaction state,
- the cumulative gas used,
- the set of logs created through execution of the transaction, and
- the Bloom filter composed from information in those logs
Best Answer
It is a similar concept to garbage collection in programming languages and in tree-based version control system like git. When ethereum contracts run, they modify their state. And since the state tree is an immutable append-only structure, it means that every time the state is modified, it gets a new state root. Some elements that were reachable from the old roots may be not be reachable from the new root (due to operations that delete or modify entries). Theoretically, they can be pruned (garbage collected). However, since the Proof-Of-Work consensus as it is does not define when state transitions are final, there is always a theoretical possibility that state will be reverted to older roots and things that were pruned would be needed again. So, pruning is currently a trade-off. We say, that, for example, after 5000 block we assume that the state won't be reverted and prune all unreachable nodes. Someone might still want to disable this feature to keep the entire history of the blockchain for special purposes.
Here is a very detailed description: https://blog.ethereum.org/2015/06/26/state-tree-pruning/