-
Notifications
You must be signed in to change notification settings - Fork 166
Indexing Accounts
If you're handling a large number of files (~10k -> ~100k+, depending on your system capacities) it becomes advantageous for geth to implement persistent indexing of the key files.
🚀 This feature is currently available on the master branch, but has not been bundled into a release (yet!), and we'll be very happy to hear your feedback if you've taken advantage of this feature. 👓
Geth allows for a large amount of leniency in managing and interacting with key files. For example, you can have a keyfile named "foobar.md", and that's just fine. The downside of this is that it presents some inefficiencies in how geth is able to match accounts (by address) with their respective files. Geth also keeps a live sync of your /keystore
directory, enabling you to add/change/remove files, with those changes reflected in realtime in geth. These conveniences and complications together -- combined with the innate awkwardness of handling a large number of small files (try running ls -l
for a directory with 500k files) -- present a hurdle when the number of key files grows to a high number.
The solutions which persistent indexing uses to address these problems are
- disable FS watching and live reload, and
- establish geth's use of
accounts.db
as its persistent cache instead of holding file indexes in memory.
Instead of asking geth to re-index and store all key files each time a geth instance is started, it'll instead connect with a lightweight key-value store in /keystore/accounts.db
.
Before running geth with the indexed accounts option, you'll first need to build the index. Run:
$ geth --verbosity 5 --index-accounts account index
to create an index store (accounts.db).
The --verbosity
flag in this example is optional, but may be desirable for monitoring progress. You may additionally use --data-dir
, --chain
, and --keystore
flags to specify custom base data directory, chain subdirectory, and keystore directory as well.
Once you've built an index, running:
$ geth --index-accounts
toggles the persistent index option.
All calls managing key files via geth API/RPC/CLI will be automatically updated and persisted in the index file. However, if you make outside changes to your key files, those changes will not be synced to the index. In this case, rebuilding your index -- with the exact same command as setup) -- will suffice to to update the index with those changes.
All measurements in nanoseconds/operation with watching disabled on *2.8 GHz Intel Core 2 Duo @ 8GB RAM.
This tests a bundled general suite of four interactions without querying. Here, in-memory is king, because the test essentially measures manipulations from memory vs. disk i/o. While persistent storage takes significantly more time (as high as 0.24 seconds for the suite), it's also worth noting that both rates behave approximately constantly across every tested number of key files.
Number of Accounts | In-memory caching | Persistent caching |
---|---|---|
100 | 29147464 | 88912525 |
500 | 58596449 | 144582028 |
1000 | 32701806 | 203180542 |
5000 | 33743063 | 74667460 |
10000 | 24605203 | 133318754 |
20000 | 26082395 | 92159681 |
100000 | 21709988 | 168614805 |
200000 | 22346704 | 249697976 |
500000 | 39045149 | 161666411 |
This tests a commonly used key transaction for a pre-existing key file/account. Here's where the payoff for persistent indexing comes; persisted key indexes must never be sorted or parsed entirely.
Number of Accounts | In-memory caching | Persistent caching |
---|---|---|
100 | 2963462 | 2923034 |
500 | 2892381 | 2871620 |
1000 | 5049611 | 3070000 |
5000 | 5862756 | 3035130 |
10000 | 5824062 | 3328642 |
20000 | 724523342 | 2990753 |
100000 | 356307574244 | 3314435 |
200000 | DNF | 5204148 |
500000 | DNF | 3114311 |
❤️ Stay Classy