Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instruction counts instead of wall clock time? #107

Open
gsnedders opened this issue Mar 28, 2021 · 4 comments
Open

Instruction counts instead of wall clock time? #107

gsnedders opened this issue Mar 28, 2021 · 4 comments

Comments

@gsnedders
Copy link

It would be interesting to investigate use of instruction counts (through Linux's perf module and similar tools on other platforms to access hardware performance counters) within pyperf.

See, for example, Nicholas Nethercote's experience with monitoring rustc performance:

Contrary to what you might expect, instruction counts have proven much better than wall times when it comes to detecting performance changes on CI, because instruction counts are much less variable than wall times (e.g. ±0.1% vs ±3%; the former is highly useful, the latter is barely useful). Using instruction counts to compare the performance of two entirely different programs (e.g. GCC vs clang) would be foolish, but it’s reasonable to use them to compare the performance of two almost-identical programs (e.g. rustc before PR #12345 and rustc after PR #12345). It’s rare for instruction count changes to not match wall time changes in that situation. If the parallel version of the rustc front-end ever becomes the default, it will be interesting to see if instruction counts continue to be effective in this manner.

Perhaps in an interpreter where dispatch overhead and boxing/unboxing cost can be significant this won't hold true due to small changes having the potential to cause to a much more significant change in cache misses, but it would still be worthwhile to investigate in my view.

@vstinner
Copy link
Member

pyperf internally stores numbers and an unit. Some part of the code ignore the unit and hardcodes seconds, but this should be fixed.

You can already switch from seconds (time) to bytes (memory footprint).

I would be fine with adding an option to measure the instruction count. But I don't know how to implement that :-) There are sometimes discussions about measuring "CPU time" rather than "wall clock time". I would be ok to have an option to use a different clock, but it should be store in the JSON to be able to distinguish benchmark results mesuring "wall clock time" than the ones measuring "CPU times". Maybe "cputime" can be used as the unit?

@vstinner
Copy link
Member

Nowadays, the number of instruction executed per CPU cycle is not a constant and depends on the code placement, cache efficiency, various timing, and so I personally prefer wall clock time. I designed pyperf to give users an idea of the performance that they will see on their machine. Not the performance on a server dedicated for benchmarks.

Well, in practice, pyperf system tune disables Turbo Boost whereas applications using a single CPU can run faster. But the important part for me is not the absolute value, but the ratio when comparing performances of a change to a reference point.

@markshannon
Copy link

I personally prefer wall clock time.

Outputting instruction counts doesn't prevent you using wall clock timings.

@vstinner
Copy link
Member

vstinner commented Feb 2, 2022

Outputting instruction counts doesn't prevent you using wall clock timings.

If someone proposes a PR, I will review it and likely merge it ;-) As I wrote, pyperf design allows to store any number with an unit, integer or floating point number.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants