Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Config Change] Stylus LRU cache with bytes capacity instead of number of entries capacity #2595

Merged
merged 46 commits into from
Sep 27, 2024

Conversation

diegoximenes
Copy link
Contributor

@diegoximenes diegoximenes commented Aug 20, 2024

Resolves NIT-2698

Stylus LRU cache with bytes capacity instead of number of entries capacity.
It infers the size of each entry by computing the number of bytes related to the serialized wasm module.

This PR also removes -Wall from cgo CFLAGS.
go doesn't handle it well: https://stackoverflow.com/questions/56633817/warning-unused-variable-cgo-a.
With it the following warning was being generated without need:

cgo-gcc-prolog: In function ‘_cgo_80afc8d9354c_Cfunc_stylus_clear_lru_cache’:
cgo-gcc-prolog:139:49: warning: unused variable ‘_cgo_a’ [-Wunused-variable]

@cla-bot cla-bot bot added the s Automatically added by the CLA bot if the creator of a PR is registered as having signed the CLA. label Aug 20, 2024
@diegoximenes diegoximenes force-pushed the stylus_cache_bytes_capacity branch 3 times, most recently from 2961c11 to b90c32a Compare August 22, 2024 13:08
@@ -75,7 +75,7 @@ var DefaultCachingConfig = CachingConfig{
SnapshotRestoreGasLimit: 300_000_000_000,
MaxNumberOfBlocksToSkipStateSaving: 0,
MaxAmountOfGasToSkipStateSaving: 0,
StylusLRUCache: 256,
StylusLRUCacheCapacity: 256,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestions on the default value?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with 256 for now. If we frequently run out, we can change it in a later PR.

@diegoximenes diegoximenes changed the title Stylus LRU cache with kilobytes capacity instead of number of entries capacity [Config Change] Stylus LRU cache with kilobytes capacity instead of number of entries capacity Aug 22, 2024
@diegoximenes diegoximenes marked this pull request as ready for review August 22, 2024 14:58
@diegoximenes diegoximenes changed the title [Config Change] Stylus LRU cache with kilobytes capacity instead of number of entries capacity [Config Change] Stylus LRU cache with bytes capacity instead of number of entries capacity Aug 22, 2024
Copy link
Contributor

@tsahee tsahee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

realised I'm sitting on these Pending comments for a while .. sorry

arbitrator/stylus/src/cache.rs Outdated Show resolved Hide resolved
arbitrator/stylus/src/cache.rs Outdated Show resolved Hide resolved
let count = cache.lru.len();
let metrics = LruCacheMetrics {
// add 1 to each entry to account that we subtracted 1 in the weight calculation
size_bytes: (cache.lru.weight() + count).try_into().unwrap(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normal entry is order of magnitude of 100,000. a few bytes per entry isn't worth fussing over (I'd go with weight = deser_size + 128 and return lru.weight() from here)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the 128 bytes to account for overheads that you mentioned :).

I kept taking into account that clru defines that each entry consumes (weight + 1) of the cache capacity though.
This simplifies tests that checks the expected cache size.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

making tests simpler is a worthy goal.
Won't you get the same result if you don't decrease 1 from each weight and don't add count here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(if not - why? if so - separate PR)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope :(.

clru defines that each entry will consume weight(entry) + 1 of the cache's capacity.
We define that weight(entry) = size_in_bytes - 1, in that way each entry will actually consume size_in_bytes of the cache's capacity.
This +1/-1 doesn't make much of a practical difference in production, but with it the cache's interface is more straightforward to understand, since we define the cache capacity's in number of bytes, and each entry consumes the number of bytes related to it.
A more straightforward interface also simplifies testing.

That said, considering that we are decreasing 1 from each weight, we add 1 one here so the metrics are consistent.
But it wouldn't be so harmful if this metric is off by one for each entry.
But since we can be precise here without compromising much of code simplicity, I don't see why we wouldn't do this :).

eljobe
eljobe previously approved these changes Aug 29, 2024
execution/gethexec/blockchain.go Show resolved Hide resolved
execution/gethexec/blockchain.go Show resolved Hide resolved
@@ -75,7 +75,7 @@ var DefaultCachingConfig = CachingConfig{
SnapshotRestoreGasLimit: 300_000_000_000,
MaxNumberOfBlocksToSkipStateSaving: 0,
MaxAmountOfGasToSkipStateSaving: 0,
StylusLRUCache: 256,
StylusLRUCacheCapacity: 256,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with 256 for now. If we frequently run out, we can change it in a later PR.

execution/gethexec/executionengine.go Outdated Show resolved Hide resolved
eljobe
eljobe previously approved these changes Aug 30, 2024
Copy link
Member

@eljobe eljobe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

gligneul
gligneul previously approved these changes Sep 5, 2024
Copy link
Contributor

@gligneul gligneul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

arbos/programs/native.go Show resolved Hide resolved
arbitrator/stylus/src/cache.rs Show resolved Hide resolved
let count = cache.lru.len();
let metrics = LruCacheMetrics {
// add 1 to each entry to account that we subtracted 1 in the weight calculation
size_bytes: (cache.lru.weight() + count).try_into().unwrap(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

making tests simpler is a worthy goal.
Won't you get the same result if you don't decrease 1 from each weight and don't add count here?

pub struct LruCounters {
pub hits: u32,
pub misses: u32,
pub does_not_fit: u32,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a counter for long-term cache size and add/remove form it when we add/remove entries to the hashmap?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(separate PR)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure :)

@@ -963,4 +963,15 @@ func (s *ExecutionEngine) Start(ctx_in context.Context) {
}
}
})
// periodically update stylus lru cache metrics
s.LaunchThread(func(ctx context.Context) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets add a boolean config weather or not to collect metrics. On by default, off by default for system-tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(separate PR)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what is the rationale of not always enabling those metrics?
Is it some worry about Prometheus performance with the number of metrics increase?

Sometimes I think that too granular configs, like this one, are not so useful, increases code complexity, and unnecessarily creates confusion for users, that will see one more config that could be "optimized" .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rational is that the stylus cache is static and in tests I don't want multiple parallel tests measuring the same PR.
I don't plan to ever turn it off in binary, just tests.

let count = cache.lru.len();
let metrics = LruCacheMetrics {
// add 1 to each entry to account that we subtracted 1 in the weight calculation
size_bytes: (cache.lru.weight() + count).try_into().unwrap(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(if not - why? if so - separate PR)

@tsahee tsahee merged commit 8230432 into master Sep 27, 2024
16 checks passed
@tsahee tsahee deleted the stylus_cache_bytes_capacity branch September 27, 2024 01:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design-approved s Automatically added by the CLA bot if the creator of a PR is registered as having signed the CLA.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants