test(vm): Refactor VM benchmarks #2668

slowli · 2024-08-16T07:30:16Z

What ❔

Integrates Prometheus metrics into criterion benches; removes the DIY benchmark correspondingly.
Merges the main benchmark crate with the harness one.
Includes benched bytecodes into the crate itself rather than reading them in runtime.

Why ❔

Makes VM benchmarks more maintainable.

Checklist

PR title corresponds to the body of PR (we generate changelog entries from PRs).
Tests for the changes have been added / updated.
Documentation comments have been added / updated.
Code has been formatted via zk fmt and zk lint.

slowli

Couple of general comments:

Iai improvements are probably caused by reducing the setup complexity. Could also to do with removing some deps that have static initialization logic like RocksDB (although it should be filtered out by iai calibration logic).

If removing the DIY benchmark / extending Criterion is controversial, I can split it off from this PR. IMO, having uniform benchmark framework for running locally and Prometheus reporting is good since it allows not to define same benches multiple times. The API looks relatively straightforward. It has its drawbacks though; e.g., collected stats include warm-up iterations (so, e.g. reported means slightly differ between Criterion and Prometheus).

Metrics for a test run are in the stage Prometheus (vm_benchmark_mean_timing_seconds etc.).

.github/workflows/vm-perf-to-prometheus.yml

core/tests/vm-benchmark/README.md

joonazan · 2024-08-19T18:10:32Z

Regarding Criterion vs. DIY: DIY gives more useful data. The average time reported by criterion is a very bad metric, especially in noisy CI. Because our programs don't use any randomized algorithms (except hashmap in old VM), the minimum time is a good measurement.

The newer Divan framework is as pleasant as Criterion and reports minima. I use it in the vm2 repo.

slowli · 2024-08-20T07:47:32Z

The average time reported by criterion is a very bad metric, especially in noisy CI.

My understanding is that in CI, we would primarily rely on metrics exported to Prometheus, which do feature the minimum iteration time (along with maximum, mean and median, and the number of iterations). They could be printed to stdout as well (added that in the latest commit). So if it's only a question of whether these values are displayed, it's not an issue.

github-actions · 2024-08-27T11:52:35Z

Detected VM performance changes

Benchmark name	change in estimated runtime	change in number of opcodes executed
decode_shl_sub	-4.9%	N/A
access_memory	-3.4%	N/A
deploy_simple_contract_legacy	-18.0%	N/A
call_far	-3.2%	N/A
write_and_decode	-4.1%	N/A
deploy_simple_contract	-11.9%	N/A
slot_hash_collision	-4.3%	N/A
call_far_legacy	-2.5%	N/A

Changes in number of opcodes executed indicate that the gas price of the benchmark has changed, which causes it run out of gas at a different time. Or that it is behaving completely differently.

slowli added 10 commits August 14, 2024 18:24

Remove unused dependencies

a6bc8b4

Remove unused benchmark binary

30dae7d

Sketch metrics for criterion benches

81e46df

Move VM bins to bin directory

cf013b1

Move common modules to bins

52ec813

Remove diy_benchmark

ea3c2e1

Merge harness crate with vm-benchmark

3e7dfe1

Move shared criterion logic to crate library

2e54328

Brush up CI benchmarking

55f7b8c

Fix benchmark measurements

dcf97b2

slowli force-pushed the aov-pla-1017-refactor-vm-benchmarks branch from 7672cfd to dcf97b2 Compare August 16, 2024 12:43

slowli added 3 commits August 19, 2024 15:16

Use gauges instead of histograms

00895cd

Rename bench targets

7b2be6c

Revert CI run conditions

2fcd728

slowli commented Aug 19, 2024

View reviewed changes

.github/workflows/vm-perf-to-prometheus.yml Outdated Show resolved Hide resolved

core/tests/vm-benchmark/README.md Show resolved Hide resolved

slowli marked this pull request as ready for review August 19, 2024 15:17

slowli requested a review from a team as a code owner August 19, 2024 15:17

slowli requested review from yorik, alexandrst88, artmakh, hatemosphere, onyxet, otani88, iluwaa, montekki, RomanBrodetski, popzxc and joonazan August 19, 2024 15:17

Print exported timings

01cd735

popzxc previously approved these changes Aug 27, 2024

View reviewed changes

Use repo secret for Prometheus URL

20b2bc7

slowli dismissed popzxc’s stale review via 20b2bc7 August 27, 2024 11:32

popzxc approved these changes Aug 27, 2024

View reviewed changes

yorik approved these changes Aug 27, 2024

View reviewed changes

slowli added this pull request to the merge queue Aug 27, 2024

Merged via the queue into main with commit bd2b5d8 Aug 27, 2024
49 checks passed

slowli deleted the aov-pla-1017-refactor-vm-benchmarks branch August 27, 2024 13:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(vm): Refactor VM benchmarks #2668

test(vm): Refactor VM benchmarks #2668

slowli commented Aug 16, 2024 •

edited

Loading

slowli left a comment •

edited

Loading

joonazan commented Aug 19, 2024

slowli commented Aug 20, 2024

github-actions bot commented Aug 27, 2024

test(vm): Refactor VM benchmarks #2668

test(vm): Refactor VM benchmarks #2668

Conversation

slowli commented Aug 16, 2024 • edited Loading

What ❔

Why ❔

Checklist

slowli left a comment • edited Loading

Choose a reason for hiding this comment

joonazan commented Aug 19, 2024

slowli commented Aug 20, 2024

github-actions bot commented Aug 27, 2024

Detected VM performance changes

slowli commented Aug 16, 2024 •

edited

Loading

slowli left a comment •

edited

Loading