Cleanup data and add new vizualizer #1

mbovel · 2024-08-21T15:06:15Z

To review this PR, I would suggest to try the instructions in the Readme to run the vizualizer locally.

The vizualizer is a single-page Javascript app with one route or each view. Routing is based on window.location.hash. I guess the single-page and single file structure is debatable, but I do think it's the easiest to work with and to avoid duplication.

It's written in vanilla JS 1. to keep things simple (it doesn't require any build tool), 2. because the previous version was done that way, 3. because it's the easiest way to interact with Plotly.js.

This is an MVP:

I only added 6 benchmarks to test the vizualizer here, but more will be added later (all the ones we currently have, plus new benchmarks). I plan to this in future PRs.
More features will come later: for example filtering by benchmark categories. We should for example be able to only display nightly benchmarks, or benchmarks related to a specific feature.

This is only a part of the new benchmarks infrastructure. There will also be independent changes in the Dotty repo: benchmarks will be run with Github actions and JMH only. https://github.com/lampepfl/bench/tree will eventually be archived.

Aggregated view

This view is vastly similar to what the current vizualizer (https://dotty-bench.epfl.ch) but provides options to customize the graphs. Additionally, min and max values are displayed as errors bars, while only min values can be displayed, as a separate trace, in the current vizualizer.

Errors bars make it easier to see if there is a significant change in performance at some point, or if benchmarks are misconfigured. For example, the screenshot above shows quite clearly that the Dotty benchmark is currently misconfigured: error bars should be overlapping most of the time but they aren't. This either means that the CPU frequency is not stable enough, or that we don't run enough iterations. This will be fixed in separate PRs.

A well configured benchmark should be closer to this for example:

Compare view

This view aims at providing better insights to compare 2 or more commits, for example to check if a PR has a significant impact.

It shows the distribution of runtimes as box plots for all benchmarks in a single graph. There is a group of boxes for each benchmark on the x-axis, with a box for each commits, differentiated by color. Runtimes are normalized to the runtime of the first commit to ease comparison across benchmarks.

In the screenshot below, we can easily see for example that the second commit (the orange one) improves the runtime of the "re2s" benchmark by ~5%, and probably does not have a significant on other benchmarks.

The goal is to point people to this view after benchmarks are run for a specific PR.

sed 's/, /,/g' history2.csv | sed 's/\([+-]\)\([0-9]\{2\}\)\([0-9]\{2\}\)/\1\2:\3/g' | sed 's/^\([^,]*,[^,]*\),/\1,true,/' | grep -v "Binary" | grep -v "Iteration" | grep -v ",," > data.csv

mbovel requested a review from sjrd August 21, 2024 15:06

mbovel mentioned this pull request Oct 7, 2024

Improve the benchmarks visualizer scala/scala3#21720

Open

mbovel added 4 commits October 9, 2024 14:48

Add new vizualizer

a3e0949

Fix memoization

1f126a3

Add import script

c38f08a

Cleanup data and add "merged" column

c00d9cd

sed 's/, /,/g' history2.csv | sed 's/\([+-]\)\([0-9]\{2\}\)\([0-9]\{2\}\)/\1\2:\3/g' | sed 's/^\([^,]*,[^,]*\),/\1,true,/' | grep -v "Binary" | grep -v "Iteration" | grep -v ",," > data.csv

mbovel force-pushed the mb/vizualizer branch from 38db705 to c00d9cd Compare October 9, 2024 13:42

Enhance logging and error handling

84cbaa4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleanup data and add new vizualizer #1

Cleanup data and add new vizualizer #1

mbovel commented Aug 21, 2024 •

edited

Loading

Cleanup data and add new vizualizer #1

Are you sure you want to change the base?

Cleanup data and add new vizualizer #1

Conversation

mbovel commented Aug 21, 2024 • edited Loading

Aggregated view

Compare view

mbovel commented Aug 21, 2024 •

edited

Loading