Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache transmissions for faster processing #3395

Open
wants to merge 6 commits into
base: mainnet-staging
Choose a base branch
from

Conversation

kpandl
Copy link
Contributor

@kpandl kpandl commented Sep 5, 2024

Motivation

The prepare_advance_to_next_quorum_block function incurs significant processing time, which contributes to increased block times. Based on a flamegraph analysis, time-consuming operations are the deserialization of objects, or more general, retrieving transmissions from storage.

Therefore, this PR introduces an LRU cache for transmissions inside the BFTPersistentStorage. The transmissions are inserted into the cache in the insert_transmissions function, alongside with the insertion into the persistent storage. The get_transmission function will first try to retrieve the transmissions from the LRU cache, and if it fails, try to retrieve it from the persistent storage.

Test Plan

Ran it locally, and also ran a load test (high tps of transfer_public executions) on aws. Thereby, we observed a ~50% reduction in block times.

Please note: The cache aims to cover two rounds of transmissions. As such, it can store MAX_COMMITTEE_SIZE * MAX_TRANSMISSIONS_PER_BATCH * 2 = 1500 transmissions. At the current MAX_TRANSACTION_SIZE of 128 kB, this results in approximately 188 MB additional memory use.

minimize diffs

fix test code

fix fmt

don't update cache

cache access speedup
Copy link
Collaborator

@ljedrz ljedrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few suggestions, other than that LGTM in general 👍.

ljedrz
ljedrz previously approved these changes Sep 10, 2024
Copy link
Collaborator

@ljedrz ljedrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@vicsn
Copy link
Contributor

vicsn commented Sep 10, 2024

CI is failing. I suggest running node and node-bft locally to find out if its actually failing. For node-router, consider using a more lenient deadline instead of the sleep (see other tests using deadline! for examples)

@@ -29,6 +29,9 @@ workspace = true
version = "2.1"
features = [ "serde", "rayon" ]

[dependencies.lru]
version = "0.12.1"
Copy link
Collaborator

@niklaslong niklaslong Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to change it but just as a note to reviewers, the latest version (which will be used by cargo when building) is 0.12.4.

vicsn
vicsn previously approved these changes Sep 12, 2024
@kpandl
Copy link
Contributor Author

kpandl commented Sep 12, 2024

Thanks for the feedback! I've updated the code. The node test ran successfully locally. The node-bft test did not, but that was likely due to constraint local resources, not an issue with the code itself.

@vicsn vicsn requested a review from raychu86 September 17, 2024 07:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants