Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test(vm): Refactor VM benchmarks #2668

Merged
merged 15 commits into from
Aug 27, 2024
Merged

Conversation

slowli
Copy link
Contributor

@slowli slowli commented Aug 16, 2024

What ❔

  • Integrates Prometheus metrics into criterion benches; removes the DIY benchmark correspondingly.
  • Merges the main benchmark crate with the harness one.
  • Includes benched bytecodes into the crate itself rather than reading them in runtime.

Why ❔

Makes VM benchmarks more maintainable.

Checklist

  • PR title corresponds to the body of PR (we generate changelog entries from PRs).
  • Tests for the changes have been added / updated.
  • Documentation comments have been added / updated.
  • Code has been formatted via zk fmt and zk lint.

@slowli slowli force-pushed the aov-pla-1017-refactor-vm-benchmarks branch from 7672cfd to dcf97b2 Compare August 16, 2024 12:43
Copy link
Contributor Author

@slowli slowli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of general comments:

Iai improvements are probably caused by reducing the setup complexity. Could also to do with removing some deps that have static initialization logic like RocksDB (although it should be filtered out by iai calibration logic).

If removing the DIY benchmark / extending Criterion is controversial, I can split it off from this PR. IMO, having uniform benchmark framework for running locally and Prometheus reporting is good since it allows not to define same benches multiple times. The API looks relatively straightforward. It has its drawbacks though; e.g., collected stats include warm-up iterations (so, e.g. reported means slightly differ between Criterion and Prometheus).

Metrics for a test run are in the stage Prometheus (vm_benchmark_mean_timing_seconds etc.).

.github/workflows/vm-perf-to-prometheus.yml Outdated Show resolved Hide resolved
core/tests/vm-benchmark/README.md Show resolved Hide resolved
@slowli slowli marked this pull request as ready for review August 19, 2024 15:17
@slowli slowli requested a review from a team as a code owner August 19, 2024 15:17
@joonazan
Copy link
Contributor

Regarding Criterion vs. DIY: DIY gives more useful data. The average time reported by criterion is a very bad metric, especially in noisy CI. Because our programs don't use any randomized algorithms (except hashmap in old VM), the minimum time is a good measurement.

The newer Divan framework is as pleasant as Criterion and reports minima. I use it in the vm2 repo.

@slowli
Copy link
Contributor Author

slowli commented Aug 20, 2024

The average time reported by criterion is a very bad metric, especially in noisy CI.

My understanding is that in CI, we would primarily rely on metrics exported to Prometheus, which do feature the minimum iteration time (along with maximum, mean and median, and the number of iterations). They could be printed to stdout as well (added that in the latest commit). So if it's only a question of whether these values are displayed, it's not an issue.

popzxc
popzxc previously approved these changes Aug 27, 2024
Copy link
Contributor

Detected VM performance changes

Benchmark name change in estimated runtime change in number of opcodes executed
decode_shl_sub -4.9% N/A
access_memory -3.4% N/A
deploy_simple_contract_legacy -18.0% N/A
call_far -3.2% N/A
write_and_decode -4.1% N/A
deploy_simple_contract -11.9% N/A
slot_hash_collision -4.3% N/A
call_far_legacy -2.5% N/A

Changes in number of opcodes executed indicate that the gas price of the benchmark has changed, which causes it run out of gas at a different time. Or that it is behaving completely differently.

@slowli slowli added this pull request to the merge queue Aug 27, 2024
Merged via the queue into main with commit bd2b5d8 Aug 27, 2024
49 checks passed
@slowli slowli deleted the aov-pla-1017-refactor-vm-benchmarks branch August 27, 2024 13:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants