-
Notifications
You must be signed in to change notification settings - Fork 317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test(nns): Analyze list_neurons performance, and look for promising areas for optimization. #1712
Draft
daniel-wong-dfinity-org
wants to merge
24
commits into
master
Choose a base branch
from
why-list_neurons-expensive-daniel-wong
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
test(nns): Analyze list_neurons performance, and look for promising areas for optimization. #1712
daniel-wong-dfinity-org
wants to merge
24
commits into
master
from
why-list_neurons-expensive-daniel-wong
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
basvandijk
reviewed
Sep 27, 2024
daniel-wong-dfinity-org
changed the title
test(nns): Trying to generate flame graph(s) for list_neurons.
test(nns): Trying to generate flame graph(s) for.
Oct 7, 2024
daniel-wong-dfinity-org
changed the title
test(nns): Trying to generate flame graph(s) for.
test(nns): Generate canister flame graph(s).
Oct 8, 2024
…). Running into a problem where ic-wasm instrument uses the 32-bit stable memory API, but we are already using more than 4 GiB of stable memory. Yan Chen confirmed that ic-wasm does not yet support the 32-bit API, but that can be fixed pretty easily.
… does not support the 64-bit stable memory API (the 32-bit API is deprecated). This required us to upgrade our version of walrus from 0.21.1 to 0.22.0. Not sure yet if this has bad side effects...
…g from more lax from_utf8_lossy to stricter from_utf8 + unwrap.
… ic-wasm instrument 😡
…al changes: 1. start_address is a byte offset, not page. 2. Read "name" metadata from ic-wasm instrumented WASM. This is later used to visualize. 3. Visualize. Output is at list_neurons.svg. This involved copying fn render_profiling from the ic-repl repo. (I will upload an example list_neurons.svg to GitHub shortly.)
…ection = True` to our rust_canister rules in BUILD files.
…ve fewer (more typical) number of neurons.
daniel-wong-dfinity-org
force-pushed
the
why-list_neurons-expensive-daniel-wong
branch
from
October 8, 2024 18:39
5bd9d89
to
b6d88a9
Compare
…ble memory neuron count (again, per principal).
daniel-wong-dfinity-org
changed the title
test(nns): Generate canister flame graph(s).
test(nns): Analyze list_neurons performance, and look for promising areas for optimization.
Oct 16, 2024
… Some unused imports, but I think everything builds at this point. My favorite change here: do not tell inferno how wide to make the svg. This is super nice, because then it is displayed using as much horizontal space as you give it.
…eem to help. Not sure why.
…format used by inferno. Includes tests that show behavior is preserved. (The next commit is going to get rid of the original impl, which was copy n' pasted, and not very self-explanatory.)
…t used by ic-wasm to the format used by inferno. Previous commit shows that the new impl is equivalent.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Usage
(You need to be able to scp from [email protected].)
Limitations
The method that you want to measure must be an update, not a query. You can change a query to an update to measure it; however, make sure that you do not merge such changes, because they are very NOT backwards compatible!
You need to disable heartbeat. Otherwise, your flame graphs will have extraneous garbage in them.
Next Steps
Prime Candid's cache by calling
ListNeuronsResponse::ty()
inpost_upgrade
, as suggested by Yan.I'm not using all the space that I allocated to ic-wasm instrument. Before, when I tried to use all of it, I think it stomped on some other VirtualMemory's. (I guess the ideal solution is for ic-wasm instrument to support MemoryManager + VirtualMemory, rather than ask for raw stable memory space, but this might be difficult.)
Check the thread in the #dev channel where I am mainly talking to Yan Chen about getting this to work. I might be able to mine that thread for additional tasks to put into this "Next Steps" section...
Make as much of this sharable as possible, because every canister developer wants this (not just NNS).
Analysis
One of the things that the
bazel test
command outputs is instructions vs. neurons. Using linear regression, I developed a very strong model (R^2 = 0.973):This tells us that the incremental cost of fetching and serving a neuron from heap is about 5x as much as it is from stable memory.
This is also telling us that the fixed cost/overhead of list_neurons is about 4.75 million instructions.
See attached Jupyter notebook for source code: list_neurons_instructions_vs_neuron_count.ipynb.txt
Observations
When the model underestimates, the principal tends to have many neurons in stable memory. Likewise, when the model overestimates, the principal tends to have very few neurons in stable memory. Not sure why that is. This does not really make sense to me... (warning: I'm not a data scientist.)
Anedotally
Seems like there can be BIG variance in the amount of instructions needed to load a neuron from stable memory. In some cases, it seems to take 100k instructions. In other cases, it seems to take 1.5M. It seems like what's happening here is that for "fast" stable memory neurons, theres no time spent fetching auxiliary data, but for "slow" stable memory neurons, most of the time fetching the neuron is spent loading auxiliary data. E.g. the neurons of principal ID an2fu-wmt2w-7fhvx-2g6qp-pq5pc-gdxu3-jfcr5-epiid-qhdzc-xnv3a-mae. We might want to dig into this, but I don't think there's time for that if we want to wrap up this investigation this week...
I think principal ID rxnbx-mhkez-7ckve-yboel-dnguy-dos76-hpdlg-fz5bf-xm5ou-72fre-wae is a good example of some (unsurprising) phenomena that seem to generalize:
Possible Solutions (not necessarily recommended!)
These three areas seem to be outside the control of the NNS. We maybe could speed up our use of stable memory by using a more efficient serialization scheme, but that would introduce more complexity.
When Rust + WASM 64 support becomes mature, that might also help us avoid the cost of stable memory by keeping more data in heap. This has an important drawback though: we would have to "freeze dry" heap data when upgrading. If there is a large amount of this, upgrades become expensive. In any case, it does not seem like Rust + WASM 64 support is coming to the IC soon.
Examples
https://drive.google.com/file/d/1eBEuEKtKlmw4hpDkFYgrkX-h40jr9s_H/view?usp=sharing
💡 You will have to download this in order to be able click on boxes in this example ☝️ to zoom in on those boxes, because the Google Drive viewer does not seem to support this.
Related Work
Max's range_neurons optimization. IIUC, this is currently only used in background (data validation) tasks, so it would not help us here.