[Ethereum] the parity light pruning mode

fast-synclight-clientsparitystate-trie-pruningsynchronization

Parity offers four different pruning methods: archive, basic, fast and light:

  --pruning METHOD         Configure pruning of the state/storage trie. METHOD
                           may be one of auto, archive, basic, fast, light:
                           archive - keep all state trie data. No pruning.
                           basic - reference count in disk DB. Slow but light.
                           fast - maintain journal overlay. Fast but 50MB used.
                           light - early merges with partial tracking. Fast
                           and light. Experimental!
                           auto - use the method most recently synced or
                           default to archive if none synced [default: auto].

I just tried out archive because I have enough space and time to do a full sync. It compares to a geth chain size like the following:

 ~ $ du -hs .parity/
17G .parity/
 ~ $ du -hs .ethereum/
15G .ethereum/

Now, I see archive will be pretty much like the default sync'ing in eth or geth. But what is basic, what is light? And is fast the same level as geth --fast?

What is the parity light pruning mode? Is this already to be considered as a light client?

Best Answer

geth and parity have differents methods to save the ethereum blockchain in their internal format. I made many benchs because i find it so long just to use a Wallet.

The pruning mode is how the block data are saved. With the archive mode, all states are saved. So, you know the state at each moment without to reload all the blockchain. With fast and light, we assume that we don't need all this information but only the current state and few before, so we remove many intermediates states.

On geth, the --fast method saves a state of the blockchain at block B[-1500] and all the states after this block (B[-1] is the last block, B[-2] is the before last block ...). So it is possible to rewind at the state of 1500 last blocks. With a full blockchain, you can do it for all blocks.

In parity, there are 4 pruning modes or journalDB algorithms:

  • Archive (Archive): as geth Archive we keep all states
  • Fast (OverlayRecent): as geth Fast we keep all the full states of the B[-i] blocks
  • Light (EarlyMerge): we keep all the states of the B[-i] blocks but in differential (so it is smaller than fast but slower access)
  • Basic (RefCounted): we keep all the states of the B[-i] blocks like with OverlayRecent but we remove states after x changes... so we have only the xth last changes

Benchmarks done on i7 3720QM 16GB ram with Geth 1.4.4 (Homestead with 1.6Mi blocks)

_____________________________________________
| Option | Disk Used | Time | Disk Written  |
|--------|-----------|------|---------------|
| none   | 19.GB     | 5h00 | 1TB           |
| fast   | 3.7GB     | 1h00 | 100GB         |
---------------------------------------------

Benchmarks done on i7 3720QM 16GB ram with Geth 1.5.0 unstable (Homestead with 1.6Mi blocks found at https://gitter.im/ethereum/go-ethereum?at=574d26c010f0fed86f49b32f)

__________________________________________________
| command     | Disk Used | Time | Disk Written  |
|-------------|-----------|------|---------------|
| geth        | 21GB      | 5h00 | 150GB         |
| geth --fast | 4.2GB     | 21m  | 35GB          |
| geth export | 1.5GB     | 10m  |               |
| geth import | 21GB      | 3h30 |               |
--------------------------------------------------

Benchmarks done on i7 3720QM 16GB ram with Parity 1.2 (Homestead with 1.6Mi blocks)

_____________________________________________
| Option | Disk Used | Time | Disk Written  |
|--------|-----------|------|---------------|
| archive| 19.GB     | 2h00 | 300GB         |
| fast   | 3.7GB     | 1h30 | 20GB          |
| light  | 2.5GB     | 2h00 | 130GB         |
---------------------------------------------

Note: When you have a node with a blockchain, you can dump the chaindata of geth directory to use it with your other computers. I check it with Linux, Windows and OS X.

Note: if you use --cache with 1024, it could be faster. But it is not significant on my system. The same goes for the --jitvm

Note: the ethereum blockchain saved the final state after transactions but it is safer to replay the transactions to check them.

Related Topic