-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Erigon2 prototype
After checking in the code from the desired branch, run the following commands:
make state
./build/bin/state erigon2 --datadir <your_datadir>
The directory referenced by the --datadir
option needs to contain block headers, block bodies, and recovered senders downloaded and computed by a recent version of Erigon. All stages are not required, only first four: Headers, BlockHashes, Bodies, Senders.
The prototype starts replaying blocks and their transactions starting from genesis, and then block 1, block 2, and so on. Every 1000 blocks it prints the progress, like so:
INFO[01-16|20:12:15.536] Processed blocks=133000
INFO[01-16|20:12:15.848] Processed blocks=134000
INFO[01-16|20:12:16.162] Processed blocks=135000
INFO[01-16|20:12:16.914] Processed blocks=136000
INFO[01-16|20:12:17.233] Processed blocks=137000
INFO[01-16|20:12:17.561] Processed blocks=138000
It is possible to interrupt the prototype by pressing Ctrl-C
on the console, or sending SIGTERM
(-15) signal to a process on Unix. When this interruption happens, informations similar to the following is printed:
INFO[01-16|20:28:21.169] Processed blocks=956000
^CINFO[01-16|20:28:21.926] Got interrupt, shutting down...
INFO[01-16|20:28:21.926] Got interrupt, shutting down...
INFO[01-16|20:28:21.926] interrupted, please wait for cleanup, next time start with --block 956723
This information helps resume the prototype from the point it was interrupted, instead of starting from the beginning, like so (in our example):
./build/bin/state erigon2 --datadir <your_datadir> --block 956723
The replaying will resume from where it was interrupted, like so:
./build/bin/state erigon2 --datadir ~/mainnet --block 956723
INFO[01-16|20:28:59.088] Processed blocks=957000
INFO[01-16|20:28:59.833] Processed blocks=958000
INFO[01-16|20:29:05.028] Processed blocks=959000
INFO[01-16|20:29:05.729] Processed blocks=960000
The prototypes creates and modifies files in two directories:
<your_datadir>\aggregator
<your_datadir>\statedb
Files in the aggregator
directory are of the following three types:
- Change file (extension
.chg
). These files are created for 4 groups of content: accounts, storage, code, and commitment, this can be recognised by the first part of their file names. Within each group of content, there are 3 possible "sequences": keys, before values, and after values. By default, only keys and after values are written. Each file corresponds to an interval of blocks up to 4096 blocks large, starting and ending blocks (exclusive) are part of the file names. Change files can be though of "Write Ahead Log" (WAL) files that contain the recent history of changes in the state. The combination of keys and after values is then used to aggregate the changes into data files described next. After aggregation, change files are removed. When enabled, the plan for before files is to be used for unwinding the state, as well as for creating the change history, though none of these two features are implemented yet. - Data files (extension
.dat
). These files, like change files, are created for 4 groups of content: accounts, storage, code, and commitment. They also correspond to an interval of blocks, but unlike change files, the interval of blocks can be larger than 4096 blocks. In fact, currently, it can be 8192, 16384, and so on blocks. Initially, data files for 4096 block intervals are created by aggregating the content of the corresponding change files. Then, individual data files can be merged with one another to form larger and larger data files. The plan for data files is to be seeded via Content Delivery Networks. - Index files (extension
.idx
). Every index file corresponds to its data file, and has almost the same file name, apart from the extension. Index files contain the serialised representation of the minimal perfect hash table, offset table (with each key or value in the data file having and offset in that table), and optionally a compressed mapping for accessing data file as an array of items. Index files are not going to be shared via Content Delivery Network, but instead created locally. For prevention potential DOS vulnerabilities, there is a plan for each Erigon2 node to build ideal hash table part from a randomly generated seed, so that no specific seed can be potentially exploited.
Files in statedb
directory contain MDBX database that is used for storing recent state (recent 90k + 4096 blocks). The recent state is getting pruned on each aggregation, and that keep the State DB size quite low.
When the prototype is launched without --block
command line option, it removes all the files from the aggregator
and statedb
directories. This is to make iterating over changing data formats more convenient, but needs to be taken into account to not accidentally remove the results of a long run.
Currently, apart from --block
option that allows resuming the prototype run after a graceful interruption, there are following additional options:
Option --commfreq
has default value 256. It means that instead of calculating commitment (currently the root hash of a Patricia Merkle Tree over the state) happens after blocks whose numbers are multiple of 256. Calculated commitment is compared to what is stored in the corresponding block header. Setting --commfreq 1
means calculating and checking commitment after every block, which has the highest performance overhead, but provides more thorough checking of the commitment calculation algorithm and code (which is still experimental and is not used anywhere except for this prototype). Setting it higher values (maximum is 4096, because otherwise it would interfere with producing commitment data files) reduces the performance overhead, allowing the prototype to run faster. It comes at a cost of highest memory usage, because state modifications are kept in memory between subsequent computations of the commitment.
Option --changesets
is designed to turn on the generation of change sets with the granularity of a single transaction (as opposed to the granularity of a block, as in Erigon beta). This is not yet functional, and subject to further active development and changes. Also, this setting has to enable writing the before value change files.
When option --pprof
is used, the prototype starts up a Web server with standard Golang pprof capabilities. It prints out following:
INFO[01-16|21:49:04.253] Starting pprof server cpu="go tool pprof -lines -http=: http://127.0.0.1:6060/debug/pprof/profile?seconds=20" heap="go tool pprof -lines -http=: http://127.0.0.1:6060/debug/pprof/heap"
Accordingly, one can generate a CPU profile by running go tool pprof
in a different console like so (to capture 2 minutes worth of CPU profiling:
% go tool pprof "http://127.0.0.1:6060/debug/pprof/profile?seconds=120"
Fetching profile over HTTP from http://127.0.0.1:6060/debug/pprof/profile?seconds=120
When profile comes back, one can use png
command in the pprof
console to generate a picture that may look like this:
Currently, recsplit.Index
type has a function Lookup
that allows computing the enumeration of a given key in the corresponding perfect hash table. Internally, this function uses hasher
member field to compute murmur 3 hash of the key to determine its bucket and fingerprint. The usage of hasher
member field makes Lookup
function not thread safe, therefore recsplit.Index
cannot be safely used from multiple go-routines. Proposed solution is to create IndexReader
type that would encapsulate the hasher
instead. Multiple IndexReader
s will be able to lookup from the same Index
instance. We cannot really have multiple recsplit.Index
instances for the same index file, because it is memory mapped.
Once index files can be used concurrently from multiple go-routines, the aggregation process can be off-loaded to a separate go-routine, so that the block/transaction replay can proceed while the aggregation of change files into data files, as well as the aggregation of data files is happening on the background. This is also how it would work in production - background aggregation should not have big impact on the performance of on-line block processing.