Add documentation for performance comparison tool #2141

karkhaz · 2023-01-23T12:41:43Z

(not to be merged).

The only really interesting file to comment on is 0100-content.md, which has bee hidden from the diff because it's too long.

celinval

Can you please move everything to be inside either tools/ or scripts/?

celinval · 2023-01-23T21:38:13Z

It might be helpful to add a link to the rendered file: https://github.com/karkhaz/kani/blob/kk-performance-benchmarking/performance-benchmarking/src/md/0010-content.md

karkhaz · 2023-01-25T21:21:50Z

Thanks @celinval, I've moved the files to tools so the rendered page is now here.

tautschnig

This is a beautiful piece of work! Just one high-level question that I couldn't immediately spot the answer to: how is concurrency/isolation handled here? Does benchcomp just not introduce any concurrency itself and it's all left to the underlying tool (which might be litani)?

tautschnig · 2023-01-30T17:31:03Z

tools/performance-benchmarking/src/md/0010-content.md

+`benchcomp` allows you to:
+
+* Run two or more sets of benchmark suites under different configurations;


I'd love to see a bit of context being set, perhaps an example of an actual problem one may be trying to solve.

tautschnig · 2023-01-30T17:31:41Z

tools/performance-benchmarking/src/md/0010-content.md

+          config:
+            command_line: ./run-cbmc-proofs.py
+            env:
+              CBMC: ~/src/cbmc/build/bin/cbmc


What about goto-cc, goto-instrument -- wouldn't it be better to have a PATH set?

tautschnig · 2023-01-30T17:32:30Z

tools/performance-benchmarking/src/md/0010-content.md

+</div>
+</div>
+
+`benchcomp` copies the `all_cbmc_proofs` directory to two temporary directories, one for each variant, and runs the command line. It uses the built-in `litani_to_benchcomp` parser to assemble the results. `benchcomp` then writes this data to the output file in JSON format (here in YAML for readability):


Could YAML output be supported out of the box? You've just created a use-case for it :-) (and it would simplify the documentation for you wouldn't have to give excuses).

+1. Or just add the json file here which is what the user should expect.

tautschnig · 2023-01-31T12:05:28Z

tools/performance-benchmarking/src/md/0010-content.md

+<div class="subpage">
+<div class="sidebar">
+<div class="side-header">
+
+Run ID: `abc123`
+
+`2023-01-01T18:42:54`
+
+[JSON version](/) of this dashboard
+
+</div>
+<div class="tags-bar">
+<div class="tags-header">
+
+**Filter dashboards by tags**
+
+</div>
+<div class="tags-container">
+
+* [cbmc](/) <span class="n_proofs">(833)</span>
+  * [s2n](/) <span class="n_proofs">(128)</span>
+  * [freertos](/) <span class="n_proofs">(547)</span>
+  * [e-sdk](/) <span class="n_proofs">(49)</span>
+  * [uses-function-contracts](/) <span class="n_proofs">(49)</span>
+
+</div>
+</div>
+</div>
+<div class="central-container">
+<div class="central-view">
+
+
+~include tmp/box_whisker/cbmc.html
+
+</div>


Is this just not rendered properly on GitHub, but otherwise works fine (including the ~include line)?

tautschnig · 2023-01-31T12:07:26Z

tools/performance-benchmarking/src/md/0010-content.md

+</div>
+</div>
+
+With either of these examples, `benchcomp` will automatically invoke the filter whenever Alice runs `benchcomp` or `benchcomp` visualize.


Should it be "benchcomp visualize"?

tautschnig · 2023-01-31T12:11:42Z

tools/performance-benchmarking/src/md/0010-content.md

+
+At the highest level, users invoke `benchcomp`, a unified front-end that runs several other sub-tools in the background.
+`benchcomp` first executes `benchcomp run`, which runs one or more _benchmark suites_ several times (each time using a different _variant_).
+`bc run` eventually returns a JSON document in the [`result.json`](#result.json-schema) format, containing the union of results from all benchmark runs under every variant.


How do bc run and benchcomp run relate to each other?

karkhaz · 2023-01-31T15:07:06Z

This is a beautiful piece of work!

Thank you @tautschnig ! :)

Just one high-level question that I couldn't immediately spot the answer to: how is concurrency/isolation handled here? Does benchcomp just not introduce any concurrency itself and it's all left to the underlying tool (which might be litani)?

My intention was that benchcomp would run each benchmark serially by default, but users could choose to run them in parallel with a -j flag.

Serially by default seems sensible because I assume that a "suite" contains many benchmarks, usually more than the number of cores, and so each benchmark suite would contain its own mechanism for parallelizing its own benchmarks. This is true for Litani and also for cargo kani, and most other testing frameworks and build systems I guess.

I can't see how benchcomp would be able to run multiple suites in parallel without risking either over- or under-utilizing CPU cores, because each suite has its own parallelism scheduler with no central control. I haven't thought about this deeply though, it would be good if this were possible. Nevertheless, I think giving users a -j switch if they want to use it is probably okay.

(Roughly speaking, I intend that benchcomp will emit a Ninja file that will take care of: running each suite; parsing the result; combining the results; filtering the combined result; and generating visualizations. All in dependency order of course. Now, the benchmark suite runs don't depend on each other so could run in parallel. I was intending to put the ninja jobs for the suite runs in a pool called suite_runs, which would have a depth of 1 by default.)

zhassan-aws

Very good overall! I added a few comments. I also suggest adding a section that lists the inputs the user needs to provide to benchcomp, e.g. the yaml configuration file, a script to run a benchmark suite, a script to parse output results into json, etc.

zhassan-aws · 2023-02-01T17:37:47Z

tools/performance-benchmarking/src/md/0010-content.md

+</div>
+<div class="col col-33">
+
+The `kani-parser.sh` script prepends the `$CBMC_DIR` environment variable to the `$PATH` before running Kani, so setting that environment variable to a different value for each variant will make Kani invoke a different version of CBMC.


Should this be run-kani-proofs.sh instead of kani-parser.sh?

zhassan-aws · 2023-02-01T18:54:17Z

tools/performance-benchmarking/src/md/0010-content.md

+            directory: ./suite_1
+            timeout: 7200
+            memout: 48G
+            patches:


What does the patch functionality do?

zhassan-aws · 2023-02-01T18:54:37Z

tools/performance-benchmarking/src/md/0010-content.md

+              USE_KISSAT: "1"
+        b:
+          provenance: file
+          path: ./suite_1/variants/b.yaml


What is the schema of this file?

zhassan-aws · 2023-02-01T18:55:14Z

tools/performance-benchmarking/src/md/0010-content.md

+
+metrics:
+  runtime:
+    lower_is_better: true


How does this parameter affect visualization?

zhassan-aws · 2023-02-01T18:56:39Z

tools/performance-benchmarking/src/md/0010-content.md

+
+    [optional] "unit": str          # for axes on graphs
+
+    [optional] "derivative": str    # human-readable term for


The example mentions "differential". Is that the same as "derivative"?

feliperodri · 2023-02-01T20:57:28Z

tools/performance-benchmarking/src/md/0010-content.md

+This says: run two *variants* (`optimized` and `release`) of a single benchmark *suite* (`all_cbmc_proofs`).
+The variants are distinguished in this case by the `CBMC` environment variable, which the AWS [proof build system](https://github.com/model-checking/cbmc-starter-kit) uses when invoking CBMC.


It'd be nice to have a bullet point description at this point of every single field in the yaml file. For instance,

provenance: ...

command_line: ...

env: ...

I could just skip it or use as a reference later.

feliperodri · 2023-02-01T21:04:21Z

tools/performance-benchmarking/src/md/0010-content.md

+Although dashboards are highly customizable, the guide describes the common elements.
+
+
+# User Walkthrough


It'd be nice to have in the User Walk-though a small example that I could follow and try the tool on my on.

feliperodri · 2023-02-01T21:08:21Z

tools/performance-benchmarking/src/md/0010-content.md

+</div>
+</div>
+
+Benchmarks whose runtime did not change between the two versions have a 'runtime increase' of 1.0.


Does that mean that if I see s2n(1.0) I get no change in runtime? How precise is that?

zhassan-aws · 2023-02-01T20:54:26Z

tools/performance-benchmarking/src/md/0010-content.md

@@ -0,0 +1,916 @@
+`benchcomp` allows you to:
+
+* Run two or more sets of benchmark suites under different configurations;


Should this say "Run sets of benchmark suites under two or more configurations"?

zhassan-aws · 2023-02-01T21:01:06Z

tools/performance-benchmarking/src/md/0010-content.md

+
+This documentation contains three sections.
+The [user walkthrough](#user-walkthrough) takes you through the process of setting up an entirely new benchmark run and dashboard.
+The [developer reference](#developer-reference) describes `benchcomp`'s architecture and the different data formats that it uses, enabling you to author a benchmark run and dashboard of your own.


We've primarily been using the "dashboard" term to mean a page that tracks performance over time. It's used here to refer to the results of a single benchcomp run, which confused me a bit. Perhaps replace it with "Visual results page" or something similar?

celinval

I haven't finished yet. So far this looks promising. Thanks!

celinval · 2023-02-01T20:53:12Z

tools/performance-benchmarking/src/md/0010-content.md

@@ -0,0 +1,916 @@
+`benchcomp` allows you to:


I think we are missing a description of what the tool is. What problem are you trying to solve? Is this a tool made to compare different runs of benchmark suites?

celinval · 2023-02-01T20:53:33Z

tools/performance-benchmarking/src/md/0010-content.md

@@ -0,0 +1,916 @@
+`benchcomp` allows you to:
+
+* Run two or more sets of benchmark suites under different configurations;


Could we use this tool to just provide visualization to one set of benchmark suite?

celinval · 2023-02-01T20:54:21Z

tools/performance-benchmarking/src/md/0010-content.md

+
+## Comparing two variants
+
+Alice wants to compare the performance of two versions of CBMC: the latest release, with and without a new optimization that she's implemented.


nit: Either use toolX or Kani. :)

celinval · 2023-02-01T20:58:15Z

tools/performance-benchmarking/src/md/0010-content.md

+</div>
+</div>
+
+`benchcomp` copies the `all_cbmc_proofs` directory to two temporary directories, one for each variant, and runs the command line. It uses the built-in `litani_to_benchcomp` parser to assemble the results. `benchcomp` then writes this data to the output file in JSON format (here in YAML for readability):


+1. Or just add the json file here which is what the user should expect.

celinval · 2023-02-01T21:00:36Z

tools/performance-benchmarking/src/md/0010-content.md

+Initially, Alice tries to do this entirely using `benchcomp`'s built-in parsers and filters.
+She performs the following steps:
+
+* Create a directory containing all AWS CBMC proofs, with a top-level script that runs all of them called `run-all-cbmc-proofs.py`


Should we expand a bit on what this script is? Doesn't benchcomp rely on the output of this script?

celinval · 2023-02-01T21:01:16Z

tools/performance-benchmarking/src/md/0010-content.md

+</div>
+</div>
+
+`benchcomp` copies the `all_cbmc_proofs` directory to two temporary directories, one for each variant, and runs the command line. It uses the built-in `litani_to_benchcomp` parser to assemble the results. `benchcomp` then writes this data to the output file in JSON format (here in YAML for readability):


Also, isn't the JSON an intermediate step for benchcomp at this point? Should we even talk about it in this use case?

celinval · 2023-02-01T21:01:58Z

tools/performance-benchmarking/src/md/0010-content.md

+</div>
+
+
+## Adding another benchmark suite


Shouldn't you talk about adding a benchmark suite first?

celinval · 2023-02-01T21:05:04Z

tools/performance-benchmarking/src/md/0010-content.md

+
+`2023-01-01T18:42:54`
+
+[JSON version](/) of this dashboard


celinval · 2023-02-01T21:08:10Z

tools/performance-benchmarking/src/md/0010-content.md

+Alice runs the entire suite, together with generating and writing out the visualization, by running `benchcomp` again.
+Alternatively, she can run `benchcomp visualize < result.json`, which loads the result of the run she did in the previous section.
+
+The resulting dashboard looks like this:


I have a few questions about this graph, not sure it's out of the scope of this PR.

karkhaz requested a review from a team as a code owner January 23, 2023 12:41

karkhaz marked this pull request as draft January 23, 2023 12:41

celinval reviewed Jan 23, 2023

View reviewed changes

WIP

86bc58a

karkhaz force-pushed the kk-performance-benchmarking branch from 61f8d34 to 86bc58a Compare January 25, 2023 21:20

tautschnig approved these changes Jan 31, 2023

View reviewed changes

zhassan-aws reviewed Feb 1, 2023

View reviewed changes

feliperodri reviewed Feb 1, 2023

View reviewed changes

zhassan-aws reviewed Feb 1, 2023

View reviewed changes

celinval reviewed Feb 1, 2023

View reviewed changes

feliperodri added the T-RFC Label RFC PRs and Issues label Mar 21, 2023

karkhaz changed the title ~~RFC: Add documentation for performance comparison tool~~ Add documentation for performance comparison tool Mar 21, 2023

feliperodri assigned karkhaz Jun 10, 2024

jaisnan assigned jaisnan and unassigned karkhaz Jun 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add documentation for performance comparison tool #2141

Add documentation for performance comparison tool #2141

karkhaz commented Jan 23, 2023 •

edited

Loading

celinval left a comment

celinval commented Jan 23, 2023

karkhaz commented Jan 25, 2023

tautschnig left a comment

tautschnig Jan 30, 2023

tautschnig Jan 30, 2023

tautschnig Jan 30, 2023

celinval Feb 1, 2023

tautschnig Jan 31, 2023

tautschnig Jan 31, 2023

tautschnig Jan 31, 2023

karkhaz commented Jan 31, 2023

zhassan-aws left a comment

zhassan-aws Feb 1, 2023

zhassan-aws Feb 1, 2023

zhassan-aws Feb 1, 2023

zhassan-aws Feb 1, 2023

zhassan-aws Feb 1, 2023

feliperodri Feb 1, 2023

feliperodri Feb 1, 2023

feliperodri Feb 1, 2023

zhassan-aws Feb 1, 2023

zhassan-aws Feb 1, 2023

celinval left a comment

celinval Feb 1, 2023

celinval Feb 1, 2023

celinval Feb 1, 2023

celinval Feb 1, 2023

celinval Feb 1, 2023

celinval Feb 1, 2023

celinval Feb 1, 2023

celinval Feb 1, 2023

celinval Feb 1, 2023

		`benchcomp` allows you to:

		* Run two or more sets of benchmark suites under different configurations;


		[optional] "unit": str # for axes on graphs

		[optional] "derivative": str # human-readable term for

		This says: run two variants (`optimized` and `release`) of a single benchmark suite (`all_cbmc_proofs`).
		The variants are distinguished in this case by the `CBMC` environment variable, which the AWS [proof build system](https://github.com/model-checking/cbmc-starter-kit) uses when invoking CBMC.

		Although dashboards are highly customizable, the guide describes the common elements.


		# User Walkthrough

		@@ -0,0 +1,916 @@
		`benchcomp` allows you to:

		* Run two or more sets of benchmark suites under different configurations;


		## Comparing two variants

		Alice wants to compare the performance of two versions of CBMC: the latest release, with and without a new optimization that she's implemented.

Add documentation for performance comparison tool #2141

Are you sure you want to change the base?

Add documentation for performance comparison tool #2141

Conversation

karkhaz commented Jan 23, 2023 • edited Loading

celinval left a comment

Choose a reason for hiding this comment

celinval commented Jan 23, 2023

karkhaz commented Jan 25, 2023

tautschnig left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karkhaz commented Jan 31, 2023

zhassan-aws left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

celinval left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karkhaz commented Jan 23, 2023 •

edited

Loading