Thanks for accepting the challenge of running a full archive node! :D
1) what are the drivers for the speed of the syncing in archive mode in parity ? Is that CPU ? Disk (hdd vs ssd) ? Memory (I also tried increasing the cache size to 4GB but it did not help) ? Network (i have a fast connection and it was sufficient until recently) ?
The main bottleneck is disk I/O. Followed by available memory for caching, followed by bandwidth, followed by CPU. But usually it breaks down to I/O issues. And since you are using a HDD, this is probably already your key issue. You can fine-tune your database by running --db-compaction hdd
. In addition, you can increase your cache depending on your available memory to something huge like 4GB with --cache-size 4096
or even more, this might reduce some IO activity.
2) Could my move from ssd to hdd be the cause of this performance drop ? In the past i had no problem catching up so that's a new problem for me.
Yes, I fear you will have not much luck with syncing the archive on an HDD.
3) Alternatively does there exists things like "checkpoints" of the full chain which i could download directly from a trusted source (like parity tech themselves perhaps) ?
No :)
But you could get - if possible for you - hands on a fast machine with flash drives or fast SSDs and huge memory, do the full archive sync, and copy the DB to your HDD after completion.
4) Perhaps orthogonal to the question itself but what i need is the Events logs recorded by Smart Contracts when they are executed. Is it really necessary for me to keep the full states when i sync the chain in order to get those ?
Not sure if I can answer that question directly, but if you are looking at future events, you could simply warp-sync to the latest block and enable archive pruning mode for all future blocks.
5) assuming the issue is with the I/O of the drive. Is there a way for me to keep the bulk of the chain db on a slow drive (like external hdd) while using my ssd only for performing the syncing of recent blocks only (like the last 100000 would be enough for my purpose) ?
Not that I'm aware of. Please follow https://github.com/paritytech/parity/issues/6280 if possible.
Best Answer
Syncing the Ethereum blockchain with Geth in
--fast
mode has two phases running in parallel: block sync and state trie download. Both phases need to be done in order to have a full node and switch to full mode where every transaction is executed and verified.The block sync downloads all the block information (header, transactions). This phase uses a lot of CPU and space to store all the data. You can observe this process in the logs with the mention of "Importing block headers and block receipts".
However, in fast mode no transaction are executed, so we do not have any account state available (ie. balances, nonces, smart contract code and data). Geth needs to download and cross-check with the latest block the state trie. This phase is called state trie download and usually takes longer than the block sync. This phase is describes in the logs by the following statements:
The charts below shows some metrics during the syncing process. We an observe that once the block sync has finished, we are storing less data and consuming less CPU and memory. However, Geth is still downlading and writing the state entries at a high rate.
When you are between 64 and 128 blocks behind, it usually mean you finished the block sync phase and during the state trie download phase, the block number count will always oscillate between 64 and 128 block behind the latest block mined on Ethereum. This is normal until the state trie download phase ends and your node is fully synced.
To know how closed you are from the end of the state trie download, compare the value of
processed=x
(latest state downloaded) with the size of the trie. It's hard to get the exact size as it grows all the time. In this recent comment, it was mentioned the trie has around 475,000,000 state entries.However, using a HDD, you might not be able to keep up and have a high enough disk write rate to catch the head (latest state entry).
This answer is inspired from my article Running an Ethereum Full Node on a RaspberryPi 4 (model B)