[Ethereum] Speed of syncing the chain in parity using archive pruning mode

paritystate-trie-pruningsynchronizationtokens

I am using parity v1.7.0-beta-5f2cabd-20170727 and I sync with ethereum chain in archive mode in order to keep all states of smart contracts. The command i use to launch parity is

parity –pruning archive

Due to the chain size I recently moved the location of the parity db for the ethereum chain from my computer's internal ssd to an external hdd. Subsequent to that i found myself unable to catchup with the chain when i launch parity (I switched off my computer a few days ago before i go on hols a few days ago). Although the client successfully connects to peers and proceeds with the syncing, the rate at which it is able to aqcuire blocks is pretty much the same as the rate at which blocks effectively get mined so as a consequence i am always "behind".

So therefore my questions are the following

1) what are the drivers for the speed of the syncing in archive mode in parity ? Is that CPU ? Disk (hdd vs ssd) ? Memory (I also tried increasing the cache size to 4GB but it did not help) ? Network (i have a fast connection and it was sufficient until recently) ?

2) Could my move from ssd to hdd be the cause of this performance drop ? In the past i had no problem catching up so that's a new problem for me.

3) Alternatively does there exists things like "checkpoints" of the full chain which i could download directly from a trusted source (like parity tech themselves perhaps) ?

4) Perhaps orthogonal to the question itself but what i need is the Events logs recorded by Smart Contracts when they are executed. Is it really necessary for me to keep the full states when i sync the chain in order to get those ?

5) assuming the issue is with the I/O of the drive. Is there a way for me to keep the bulk of the chain db on a slow drive (like external hdd) while using my ssd only for performing the syncing of recent blocks only (like the last 100000 would be enough for my purpose) ?

Thanks very much in advance for your insights!

Best Answer

Thanks for accepting the challenge of running a full archive node! :D

1) what are the drivers for the speed of the syncing in archive mode in parity ? Is that CPU ? Disk (hdd vs ssd) ? Memory (I also tried increasing the cache size to 4GB but it did not help) ? Network (i have a fast connection and it was sufficient until recently) ?

The main bottleneck is disk I/O. Followed by available memory for caching, followed by bandwidth, followed by CPU. But usually it breaks down to I/O issues. And since you are using a HDD, this is probably already your key issue. You can fine-tune your database by running --db-compaction hdd. In addition, you can increase your cache depending on your available memory to something huge like 4GB with --cache-size 4096 or even more, this might reduce some IO activity.

2) Could my move from ssd to hdd be the cause of this performance drop ? In the past i had no problem catching up so that's a new problem for me.

Yes, I fear you will have not much luck with syncing the archive on an HDD.

3) Alternatively does there exists things like "checkpoints" of the full chain which i could download directly from a trusted source (like parity tech themselves perhaps) ?

No :)

But you could get - if possible for you - hands on a fast machine with flash drives or fast SSDs and huge memory, do the full archive sync, and copy the DB to your HDD after completion.

4) Perhaps orthogonal to the question itself but what i need is the Events logs recorded by Smart Contracts when they are executed. Is it really necessary for me to keep the full states when i sync the chain in order to get those ?

Not sure if I can answer that question directly, but if you are looking at future events, you could simply warp-sync to the latest block and enable archive pruning mode for all future blocks.

5) assuming the issue is with the I/O of the drive. Is there a way for me to keep the bulk of the chain db on a slow drive (like external hdd) while using my ssd only for performing the syncing of recent blocks only (like the last 100000 would be enough for my purpose) ?

Not that I'm aware of. Please follow https://github.com/paritytech/parity/issues/6280 if possible.

Related Topic