Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation for performance comparison tool #2141

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

karkhaz
Copy link
Contributor

@karkhaz karkhaz commented Jan 23, 2023

(not to be merged).

The only really interesting file to comment on is 0100-content.md, which has bee hidden from the diff because it's too long.

@karkhaz karkhaz requested a review from a team as a code owner January 23, 2023 12:41
@karkhaz karkhaz marked this pull request as draft January 23, 2023 12:41
Copy link
Contributor

@celinval celinval left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please move everything to be inside either tools/ or scripts/?

@celinval
Copy link
Contributor

@karkhaz
Copy link
Contributor Author

karkhaz commented Jan 25, 2023

Thanks @celinval, I've moved the files to tools so the rendered page is now here.

Copy link
Member

@tautschnig tautschnig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a beautiful piece of work! Just one high-level question that I couldn't immediately spot the answer to: how is concurrency/isolation handled here? Does benchcomp just not introduce any concurrency itself and it's all left to the underlying tool (which might be litani)?

Comment on lines +1 to +3
`benchcomp` allows you to:

* Run two or more sets of benchmark suites under different configurations;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to see a bit of context being set, perhaps an example of an actual problem one may be trying to solve.

config:
command_line: ./run-cbmc-proofs.py
env:
CBMC: ~/src/cbmc/build/bin/cbmc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about goto-cc, goto-instrument -- wouldn't it be better to have a PATH set?

</div>
</div>

`benchcomp` copies the `all_cbmc_proofs` directory to two temporary directories, one for each variant, and runs the command line. It uses the built-in `litani_to_benchcomp` parser to assemble the results. `benchcomp` then writes this data to the output file in JSON format (here in YAML for readability):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could YAML output be supported out of the box? You've just created a use-case for it :-) (and it would simplify the documentation for you wouldn't have to give excuses).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Or just add the json file here which is what the user should expect.

Comment on lines +152 to +186
<div class="subpage">
<div class="sidebar">
<div class="side-header">

Run ID: `abc123`

`2023-01-01T18:42:54`

[JSON version](/) of this dashboard

</div>
<div class="tags-bar">
<div class="tags-header">

**Filter dashboards by tags**

</div>
<div class="tags-container">

* [cbmc](/) <span class="n_proofs">(833)</span>
* [s2n](/) <span class="n_proofs">(128)</span>
* [freertos](/) <span class="n_proofs">(547)</span>
* [e-sdk](/) <span class="n_proofs">(49)</span>
* [uses-function-contracts](/) <span class="n_proofs">(49)</span>

</div>
</div>
</div>
<div class="central-container">
<div class="central-view">


~include tmp/box_whisker/cbmc.html

</div>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just not rendered properly on GitHub, but otherwise works fine (including the ~include line)?

</div>
</div>

With either of these examples, `benchcomp` will automatically invoke the filter whenever Alice runs `benchcomp` or `benchcomp` visualize.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be "benchcomp visualize"?


At the highest level, users invoke `benchcomp`, a unified front-end that runs several other sub-tools in the background.
`benchcomp` first executes `benchcomp run`, which runs one or more _benchmark suites_ several times (each time using a different _variant_).
`bc run` eventually returns a JSON document in the [`result.json`](#result.json-schema) format, containing the union of results from all benchmark runs under every variant.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do bc run and benchcomp run relate to each other?

@karkhaz
Copy link
Contributor Author

karkhaz commented Jan 31, 2023

This is a beautiful piece of work!

Thank you @tautschnig ! :)

Just one high-level question that I couldn't immediately spot the answer to: how is concurrency/isolation handled here? Does benchcomp just not introduce any concurrency itself and it's all left to the underlying tool (which might be litani)?

My intention was that benchcomp would run each benchmark serially by default, but users could choose to run them in parallel with a -j flag.

Serially by default seems sensible because I assume that a "suite" contains many benchmarks, usually more than the number of cores, and so each benchmark suite would contain its own mechanism for parallelizing its own benchmarks. This is true for Litani and also for cargo kani, and most other testing frameworks and build systems I guess.

I can't see how benchcomp would be able to run multiple suites in parallel without risking either over- or under-utilizing CPU cores, because each suite has its own parallelism scheduler with no central control. I haven't thought about this deeply though, it would be good if this were possible. Nevertheless, I think giving users a -j switch if they want to use it is probably okay.

(Roughly speaking, I intend that benchcomp will emit a Ninja file that will take care of: running each suite; parsing the result; combining the results; filtering the combined result; and generating visualizations. All in dependency order of course. Now, the benchmark suite runs don't depend on each other so could run in parallel. I was intending to put the ninja jobs for the suite runs in a pool called suite_runs, which would have a depth of 1 by default.)

Copy link
Contributor

@zhassan-aws zhassan-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good overall! I added a few comments. I also suggest adding a section that lists the inputs the user needs to provide to benchcomp, e.g. the yaml configuration file, a script to run a benchmark suite, a script to parse output results into json, etc.

</div>
<div class="col col-33">

The `kani-parser.sh` script prepends the `$CBMC_DIR` environment variable to the `$PATH` before running Kani, so setting that environment variable to a different value for each variant will make Kani invoke a different version of CBMC.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be run-kani-proofs.sh instead of kani-parser.sh?

directory: ./suite_1
timeout: 7200
memout: 48G
patches:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the patch functionality do?

USE_KISSAT: "1"
b:
provenance: file
path: ./suite_1/variants/b.yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the schema of this file?


metrics:
runtime:
lower_is_better: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this parameter affect visualization?


[optional] "unit": str # for axes on graphs

[optional] "derivative": str # human-readable term for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example mentions "differential". Is that the same as "derivative"?

Comment on lines +63 to +64
This says: run two *variants* (`optimized` and `release`) of a single benchmark *suite* (`all_cbmc_proofs`).
The variants are distinguished in this case by the `CBMC` environment variable, which the AWS [proof build system](https://github.com/model-checking/cbmc-starter-kit) uses when invoking CBMC.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to have a bullet point description at this point of every single field in the yaml file. For instance,

  • provenance: ...
  • command_line: ...
  • env: ...

I could just skip it or use as a reference later.

Although dashboards are highly customizable, the guide describes the common elements.


# User Walkthrough
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to have in the User Walk-though a small example that I could follow and try the tool on my on.

</div>
</div>

Benchmarks whose runtime did not change between the two versions have a 'runtime increase' of 1.0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean that if I see s2n(1.0) I get no change in runtime? How precise is that?

@@ -0,0 +1,916 @@
`benchcomp` allows you to:

* Run two or more sets of benchmark suites under different configurations;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this say "Run sets of benchmark suites under two or more configurations"?


This documentation contains three sections.
The [user walkthrough](#user-walkthrough) takes you through the process of setting up an entirely new benchmark run and dashboard.
The [developer reference](#developer-reference) describes `benchcomp`'s architecture and the different data formats that it uses, enabling you to author a benchmark run and dashboard of your own.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've primarily been using the "dashboard" term to mean a page that tracks performance over time. It's used here to refer to the results of a single benchcomp run, which confused me a bit. Perhaps replace it with "Visual results page" or something similar?

Copy link
Contributor

@celinval celinval left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't finished yet. So far this looks promising. Thanks!

@@ -0,0 +1,916 @@
`benchcomp` allows you to:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are missing a description of what the tool is. What problem are you trying to solve? Is this a tool made to compare different runs of benchmark suites?

@@ -0,0 +1,916 @@
`benchcomp` allows you to:

* Run two or more sets of benchmark suites under different configurations;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use this tool to just provide visualization to one set of benchmark suite?


## Comparing two variants

Alice wants to compare the performance of two versions of CBMC: the latest release, with and without a new optimization that she's implemented.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Either use toolX or Kani. :)

</div>
</div>

`benchcomp` copies the `all_cbmc_proofs` directory to two temporary directories, one for each variant, and runs the command line. It uses the built-in `litani_to_benchcomp` parser to assemble the results. `benchcomp` then writes this data to the output file in JSON format (here in YAML for readability):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Or just add the json file here which is what the user should expect.

Initially, Alice tries to do this entirely using `benchcomp`'s built-in parsers and filters.
She performs the following steps:

* Create a directory containing all AWS CBMC proofs, with a top-level script that runs all of them called `run-all-cbmc-proofs.py`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we expand a bit on what this script is? Doesn't benchcomp rely on the output of this script?

</div>
</div>

`benchcomp` copies the `all_cbmc_proofs` directory to two temporary directories, one for each variant, and runs the command line. It uses the built-in `litani_to_benchcomp` parser to assemble the results. `benchcomp` then writes this data to the output file in JSON format (here in YAML for readability):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, isn't the JSON an intermediate step for benchcomp at this point? Should we even talk about it in this use case?

</div>


## Adding another benchmark suite
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't you talk about adding a benchmark suite first?


`2023-01-01T18:42:54`

[JSON version](/) of this dashboard
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Alice runs the entire suite, together with generating and writing out the visualization, by running `benchcomp` again.
Alternatively, she can run `benchcomp visualize < result.json`, which loads the result of the run she did in the previous section.

The resulting dashboard looks like this:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few questions about this graph, not sure it's out of the scope of this PR.

@feliperodri feliperodri added the T-RFC Label RFC PRs and Issues label Mar 21, 2023
@karkhaz karkhaz changed the title RFC: Add documentation for performance comparison tool Add documentation for performance comparison tool Mar 21, 2023
@jaisnan jaisnan assigned jaisnan and unassigned karkhaz Jun 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-RFC Label RFC PRs and Issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants