-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add documentation for performance comparison tool #2141
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please move everything to be inside either tools/
or scripts/
?
It might be helpful to add a link to the rendered file: https://github.com/karkhaz/kani/blob/kk-performance-benchmarking/performance-benchmarking/src/md/0010-content.md |
61f8d34
to
86bc58a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a beautiful piece of work! Just one high-level question that I couldn't immediately spot the answer to: how is concurrency/isolation handled here? Does benchcomp
just not introduce any concurrency itself and it's all left to the underlying tool (which might be litani
)?
`benchcomp` allows you to: | ||
|
||
* Run two or more sets of benchmark suites under different configurations; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd love to see a bit of context being set, perhaps an example of an actual problem one may be trying to solve.
config: | ||
command_line: ./run-cbmc-proofs.py | ||
env: | ||
CBMC: ~/src/cbmc/build/bin/cbmc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about goto-cc
, goto-instrument
-- wouldn't it be better to have a PATH set?
</div> | ||
</div> | ||
|
||
`benchcomp` copies the `all_cbmc_proofs` directory to two temporary directories, one for each variant, and runs the command line. It uses the built-in `litani_to_benchcomp` parser to assemble the results. `benchcomp` then writes this data to the output file in JSON format (here in YAML for readability): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could YAML output be supported out of the box? You've just created a use-case for it :-) (and it would simplify the documentation for you wouldn't have to give excuses).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Or just add the json
file here which is what the user should expect.
<div class="subpage"> | ||
<div class="sidebar"> | ||
<div class="side-header"> | ||
|
||
Run ID: `abc123` | ||
|
||
`2023-01-01T18:42:54` | ||
|
||
[JSON version](/) of this dashboard | ||
|
||
</div> | ||
<div class="tags-bar"> | ||
<div class="tags-header"> | ||
|
||
**Filter dashboards by tags** | ||
|
||
</div> | ||
<div class="tags-container"> | ||
|
||
* [cbmc](/) <span class="n_proofs">(833)</span> | ||
* [s2n](/) <span class="n_proofs">(128)</span> | ||
* [freertos](/) <span class="n_proofs">(547)</span> | ||
* [e-sdk](/) <span class="n_proofs">(49)</span> | ||
* [uses-function-contracts](/) <span class="n_proofs">(49)</span> | ||
|
||
</div> | ||
</div> | ||
</div> | ||
<div class="central-container"> | ||
<div class="central-view"> | ||
|
||
|
||
~include tmp/box_whisker/cbmc.html | ||
|
||
</div> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just not rendered properly on GitHub, but otherwise works fine (including the ~include
line)?
</div> | ||
</div> | ||
|
||
With either of these examples, `benchcomp` will automatically invoke the filter whenever Alice runs `benchcomp` or `benchcomp` visualize. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be "benchcomp visualize
"?
|
||
At the highest level, users invoke `benchcomp`, a unified front-end that runs several other sub-tools in the background. | ||
`benchcomp` first executes `benchcomp run`, which runs one or more _benchmark suites_ several times (each time using a different _variant_). | ||
`bc run` eventually returns a JSON document in the [`result.json`](#result.json-schema) format, containing the union of results from all benchmark runs under every variant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do bc run
and benchcomp run
relate to each other?
Thank you @tautschnig ! :)
My intention was that benchcomp would run each benchmark serially by default, but users could choose to run them in parallel with a Serially by default seems sensible because I assume that a "suite" contains many benchmarks, usually more than the number of cores, and so each benchmark suite would contain its own mechanism for parallelizing its own benchmarks. This is true for Litani and also for I can't see how benchcomp would be able to run multiple suites in parallel without risking either over- or under-utilizing CPU cores, because each suite has its own parallelism scheduler with no central control. I haven't thought about this deeply though, it would be good if this were possible. Nevertheless, I think giving users a (Roughly speaking, I intend that benchcomp will emit a Ninja file that will take care of: running each suite; parsing the result; combining the results; filtering the combined result; and generating visualizations. All in dependency order of course. Now, the benchmark suite runs don't depend on each other so could run in parallel. I was intending to put the ninja jobs for the suite runs in a pool called |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good overall! I added a few comments. I also suggest adding a section that lists the inputs the user needs to provide to benchcomp
, e.g. the yaml configuration file, a script to run a benchmark suite, a script to parse output results into json, etc.
</div> | ||
<div class="col col-33"> | ||
|
||
The `kani-parser.sh` script prepends the `$CBMC_DIR` environment variable to the `$PATH` before running Kani, so setting that environment variable to a different value for each variant will make Kani invoke a different version of CBMC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be run-kani-proofs.sh
instead of kani-parser.sh
?
directory: ./suite_1 | ||
timeout: 7200 | ||
memout: 48G | ||
patches: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the patch functionality do?
USE_KISSAT: "1" | ||
b: | ||
provenance: file | ||
path: ./suite_1/variants/b.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the schema of this file?
|
||
metrics: | ||
runtime: | ||
lower_is_better: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this parameter affect visualization?
|
||
[optional] "unit": str # for axes on graphs | ||
|
||
[optional] "derivative": str # human-readable term for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example mentions "differential". Is that the same as "derivative"?
This says: run two *variants* (`optimized` and `release`) of a single benchmark *suite* (`all_cbmc_proofs`). | ||
The variants are distinguished in this case by the `CBMC` environment variable, which the AWS [proof build system](https://github.com/model-checking/cbmc-starter-kit) uses when invoking CBMC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice to have a bullet point description at this point of every single field in the yaml file. For instance,
- provenance: ...
- command_line: ...
- env: ...
I could just skip it or use as a reference later.
Although dashboards are highly customizable, the guide describes the common elements. | ||
|
||
|
||
# User Walkthrough |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice to have in the User Walk-though a small example that I could follow and try the tool on my on.
</div> | ||
</div> | ||
|
||
Benchmarks whose runtime did not change between the two versions have a 'runtime increase' of 1.0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that mean that if I see s2n(1.0) I get no change in runtime? How precise is that?
@@ -0,0 +1,916 @@ | |||
`benchcomp` allows you to: | |||
|
|||
* Run two or more sets of benchmark suites under different configurations; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this say "Run sets of benchmark suites under two or more configurations"?
|
||
This documentation contains three sections. | ||
The [user walkthrough](#user-walkthrough) takes you through the process of setting up an entirely new benchmark run and dashboard. | ||
The [developer reference](#developer-reference) describes `benchcomp`'s architecture and the different data formats that it uses, enabling you to author a benchmark run and dashboard of your own. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've primarily been using the "dashboard" term to mean a page that tracks performance over time. It's used here to refer to the results of a single benchcomp run, which confused me a bit. Perhaps replace it with "Visual results page" or something similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't finished yet. So far this looks promising. Thanks!
@@ -0,0 +1,916 @@ | |||
`benchcomp` allows you to: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are missing a description of what the tool is. What problem are you trying to solve? Is this a tool made to compare different runs of benchmark suites?
@@ -0,0 +1,916 @@ | |||
`benchcomp` allows you to: | |||
|
|||
* Run two or more sets of benchmark suites under different configurations; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use this tool to just provide visualization to one set of benchmark suite?
|
||
## Comparing two variants | ||
|
||
Alice wants to compare the performance of two versions of CBMC: the latest release, with and without a new optimization that she's implemented. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Either use toolX or Kani. :)
</div> | ||
</div> | ||
|
||
`benchcomp` copies the `all_cbmc_proofs` directory to two temporary directories, one for each variant, and runs the command line. It uses the built-in `litani_to_benchcomp` parser to assemble the results. `benchcomp` then writes this data to the output file in JSON format (here in YAML for readability): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Or just add the json
file here which is what the user should expect.
Initially, Alice tries to do this entirely using `benchcomp`'s built-in parsers and filters. | ||
She performs the following steps: | ||
|
||
* Create a directory containing all AWS CBMC proofs, with a top-level script that runs all of them called `run-all-cbmc-proofs.py` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we expand a bit on what this script is? Doesn't benchcomp rely on the output of this script?
</div> | ||
</div> | ||
|
||
`benchcomp` copies the `all_cbmc_proofs` directory to two temporary directories, one for each variant, and runs the command line. It uses the built-in `litani_to_benchcomp` parser to assemble the results. `benchcomp` then writes this data to the output file in JSON format (here in YAML for readability): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, isn't the JSON
an intermediate step for benchcomp at this point? Should we even talk about it in this use case?
</div> | ||
|
||
|
||
## Adding another benchmark suite |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't you talk about adding a benchmark suite first?
|
||
`2023-01-01T18:42:54` | ||
|
||
[JSON version](/) of this dashboard |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
Alice runs the entire suite, together with generating and writing out the visualization, by running `benchcomp` again. | ||
Alternatively, she can run `benchcomp visualize < result.json`, which loads the result of the run she did in the previous section. | ||
|
||
The resulting dashboard looks like this: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few questions about this graph, not sure it's out of the scope of this PR.
(not to be merged).
The only really interesting file to comment on is 0100-content.md, which has bee hidden from the diff because it's too long.