Instruction counts instead of wall clock time? #107

gsnedders · 2021-03-28T23:09:09Z

It would be interesting to investigate use of instruction counts (through Linux's perf module and similar tools on other platforms to access hardware performance counters) within pyperf.

See, for example, Nicholas Nethercote's experience with monitoring rustc performance:

Contrary to what you might expect, instruction counts have proven much better than wall times when it comes to detecting performance changes on CI, because instruction counts are much less variable than wall times (e.g. ±0.1% vs ±3%; the former is highly useful, the latter is barely useful). Using instruction counts to compare the performance of two entirely different programs (e.g. GCC vs clang) would be foolish, but it’s reasonable to use them to compare the performance of two almost-identical programs (e.g. rustc before PR #12345 and rustc after PR #12345). It’s rare for instruction count changes to not match wall time changes in that situation. If the parallel version of the rustc front-end ever becomes the default, it will be interesting to see if instruction counts continue to be effective in this manner.

Perhaps in an interpreter where dispatch overhead and boxing/unboxing cost can be significant this won't hold true due to small changes having the potential to cause to a much more significant change in cache misses, but it would still be worthwhile to investigate in my view.

vstinner · 2021-03-29T11:38:42Z

pyperf internally stores numbers and an unit. Some part of the code ignore the unit and hardcodes seconds, but this should be fixed.

You can already switch from seconds (time) to bytes (memory footprint).

I would be fine with adding an option to measure the instruction count. But I don't know how to implement that :-) There are sometimes discussions about measuring "CPU time" rather than "wall clock time". I would be ok to have an option to use a different clock, but it should be store in the JSON to be able to distinguish benchmark results mesuring "wall clock time" than the ones measuring "CPU times". Maybe "cputime" can be used as the unit?

vstinner · 2021-03-29T11:42:04Z

Nowadays, the number of instruction executed per CPU cycle is not a constant and depends on the code placement, cache efficiency, various timing, and so I personally prefer wall clock time. I designed pyperf to give users an idea of the performance that they will see on their machine. Not the performance on a server dedicated for benchmarks.

Well, in practice, pyperf system tune disables Turbo Boost whereas applications using a single CPU can run faster. But the important part for me is not the absolute value, but the ratio when comparing performances of a change to a reference point.

markshannon · 2022-02-02T12:43:25Z

I personally prefer wall clock time.

Outputting instruction counts doesn't prevent you using wall clock timings.

vstinner · 2022-02-02T13:18:03Z

Outputting instruction counts doesn't prevent you using wall clock timings.

If someone proposes a PR, I will review it and likely merge it ;-) As I wrote, pyperf design allows to store any number with an unit, integer or floating point number.

markshannon mentioned this issue Feb 2, 2022

Generate perf stats #122

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Instruction counts instead of wall clock time? #107

Instruction counts instead of wall clock time? #107

gsnedders commented Mar 28, 2021

vstinner commented Mar 29, 2021

vstinner commented Mar 29, 2021

markshannon commented Feb 2, 2022

vstinner commented Feb 2, 2022

Instruction counts instead of wall clock time? #107

Instruction counts instead of wall clock time? #107

Comments

gsnedders commented Mar 28, 2021

vstinner commented Mar 29, 2021

vstinner commented Mar 29, 2021

markshannon commented Feb 2, 2022

vstinner commented Feb 2, 2022