[Ethereum] Do we have a full archive node after a geth fast sync

fast-syncgo-ethereum

I have read the comments in PR for the fast sync and I am a little bit confused that Péter Szilágyi is stating that after a fast sync the user will have a full archive node with all historical states.

This allows a fast synced node to act as a full archive node from all intents and purposes.

My understanding of the fast sync is that geth is downloading some historical states on the way to the pivot block (state download running in parallel to the header/receipt download), but the complete state download will only happen at the pivot point. From there on it continues with the full sync.

Hence, in the best case, we have some historical states before the pivot point and a full history after that, right? Can someone please confirm/correct my mental model?

Best Answer

I have the same understanding as you - the fast synced node will have all blocks, but the state only for last 64 blocks and onwards. Maybe what the author meant is "This allows a fast synced node to act as a full archive node from all intents and purposes, except there will be no state data for blocks earlier than the pivot block" :)

In geth 1.8 pruning is enabled by default https://github.com/ethereum/go-ethereum/releases/tag/v1.8.0. This means that even after fast sync is complete the node will not maintain the historical state. To disable pruning --gcmode=archive should be used. This, as I understand, can become a problem, if e.g. you started a fast sync at some point, in the meantime the network is moving ahead and most of the nodes with pruning enabled will prune the state data of your pivot block. This means you can continue syncing by pulling data only from nodes that disable pruning.

Tracing and pruning: By default, state for the last 128 blocks kept in memory. Most states are garbage collected. If you are running a block explorer or other service relying on transaction tracing without an archive node (--gcmode=archive), you need to trace within this window! Alternatively, specify the "reexec" tracer option to allow regenerating historical state; and ideally switch to chain tracing which amortizes overhead across all traced blocks.

Summary

I downloaded the geth source, modified the source code to specify the fast sync pivot block, compiled the code, removed the old chaindata and started the fast syncing. Once this is complete, I'll be back to running the regular geth binaries.

UPDATE This experiment failed. There were a few different errors with my hack that prevented the blockchain to fast sync to the specified block and then normal sync after the specified block. Back to full archive node sync.

Anyone has any suggestions?

Details

I downloaded the source code for geth and modified the source code for section that calculates the fast sync pivot point eth/downloader/downloader.go, lines 419-441:

case FastSync:
    // Calculate the new fast/slow sync pivot point
    if d.fsPivotLock == nil {
        pivotOffset, err := rand.Int(rand.Reader, big.NewInt(int64(fsPivotInterval)))
        if err != nil {
            panic(fmt.Sprintf("Failed to access crypto random source: %v", err))
        }
        if height > uint64(fsMinFullBlocks)+pivotOffset.Uint64() {
            pivot = height - uint64(fsMinFullBlocks) - pivotOffset.Uint64()
        }
    } else {
        // Pivot point locked in, use this and do not pick a new one!
        pivot = d.fsPivotLock.Number.Uint64()
    }
    // If the point is below the origin, move origin back to ensure state download
    if pivot < origin {
        if pivot > 0 {
            origin = pivot - 1
        } else {
            origin = 0
        }
    }
    glog.V(logger.Debug).Infof("Fast syncing until pivot block #%d", pivot)

I modified the last line above to change the Debug into Info and added the following two lines below the code above:

    glog.V(logger.Info).Infof("Fast syncing until pivot block #%d", pivot)
    if (pivot >= 2394190) {
      pivot = 2394190;
    }
    glog.V(logger.Info).Infof("Fast syncing until modified pivot block #%d", pivot)

I recompiled and started off the fast sync process using the modified binaries:

Iota:go-ethereum user$ make geth
...
Done building.
Run "build/bin/geth" to launch geth.

I checked the version of the modified geth:

Iota:go-ethereum user$ build/bin/geth version
Geth
Version: 1.5.3-unstable

I removed the old damaged chaindata:

Iota:go-ethereum user$ build/bin/geth removedb
/Users/bok/Library/Ethereum/chaindata
Remove this database? [y/N] y

Removing...
Removed in 35.242291ms

I started the fast sync:

Iota:go-ethereum user$ build/bin/geth --fast --cache=1024 console
I1120 23:44:44.870142 ethdb/database.go:83] Allotted 1024MB cache and 1024 file handles to /Users/user/Library/Ethereum/geth/chaindata
I1120 23:44:44.878926 ethdb/database.go:176] closed db:/Users/user/Library/Ethereum/geth/chaindata
...
I1121 08:33:51.340811 eth/downloader/downloader.go:441] Fast syncing until pivot block #2664150
I1121 08:33:51.340847 eth/downloader/downloader.go:445] Fast syncing until modified pivot block #2394190

After the fast syncing is complete, I'll go back to using the regular geth binaries.

[Ethereum] Why is Geth’s “fast” sync now the default, whereas before it wasn’t

The default is now "snap" sync which completes in about 4 or 5 hours for me on a SATA SSD.

Best Answer

Related Solutions

[Ethereum] How to manually configure the geth fast sync pivot block

Summary

Details

[Ethereum] Why is Geth’s “fast” sync now the default, whereas before it wasn’t

Related Topic