go-ethereum – How to Scrape Ethereum Blockchain in Human-Readable Format (CSV) – Step-by-Step Instructions

blockchaingo-ethereumpyethereumtransactions

I wanted to scrape the whole ethereum blockchain in csv format. For this I reverse engineered and extended this code by vitalik: https://github.com/ethereum/research/blob/master/uncle_regressions/block_datadump_generator.py

My version is here:
https://github.com/ankitchiplunkar/analyzeEthereum

The problem is that I can download the blockchain till block 2,675,000 (EIP 155 hardfork) but after that I get the following error:

>>> tempTxDictionary = tempTx.to_dict()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/******/anaconda2/envs/pyethapp/lib/python2.7/site-packages/ethereum/transactions.py",
line 155, in to_dict
d['sender'] = self.sender
File "/home/******/anaconda2/envs/pyethapp/lib/python2.7/site-packages/ethereum/transactions.py",
line 81, in sender
raise InvalidTransaction("Invalid signature values!")
ethereum.exceptions.InvalidTransaction: Invalid signature values!

On a closer look I can see that

tempTx.v = 37 or 38

The yellow paper says that "v" should be either 27 or 28 (equation 211).
There is something different in how pyethereum/transactions.py is reading the block after EIP155.

What should I do to fix this?

PS: here is a simpler method to recreate the error

import rlp
from ethereum.blocks import BlockHeader
from ethereum.transactions import Transaction
from ethereum import utils
import csv
gethDumpFileName = 'geth.dump'
f = open(gethDumpFileName)
pos = 3561286829 # position of block 2,675,002
f.seek(pos)
prefix = f.read(10)
_typ, _len, _pos = rlp.codec.consume_length_prefix(prefix, 0)
blkdata = prefix + f.read(_pos + _len - 10)
header = rlp.decode(rlp.descend(blkdata, 0), BlockHeader)
headerDictionary = header.to_dict()
i = 0
tempTx = rlp.decode(rlp.descend(blkdata, 1, i), Transaction)
tempTx.v = tempTx.v -10
tempTxDictionary = tempTx.to_dict()

Best Answer

If i'm not misstaken, this is stated in EIP 155:

If block.number >= FORK_BLKNUM and v = CHAIN_ID * 2 + 35 or v = CHAIN_ID * 2 + 36, then when computing the hash of a transaction for purposes of signing or recovering, instead of hashing only the first six elements (ie. nonce, gasprice, startgas, to, value, data), hash nine elements, with v replaced by CHAIN_ID, r = 0 and s = 0. The currently existing signature scheme using v = 27 and v = 28 remains valid and continues to operate under the same rules as it does now.

So, if chainId = 1, then v becomes 37 or 38 which is the value you are seeing.