-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Instruction counts instead of wall clock time? #107
Comments
pyperf internally stores numbers and an unit. Some part of the code ignore the unit and hardcodes seconds, but this should be fixed. You can already switch from seconds (time) to bytes (memory footprint). I would be fine with adding an option to measure the instruction count. But I don't know how to implement that :-) There are sometimes discussions about measuring "CPU time" rather than "wall clock time". I would be ok to have an option to use a different clock, but it should be store in the JSON to be able to distinguish benchmark results mesuring "wall clock time" than the ones measuring "CPU times". Maybe "cputime" can be used as the unit? |
Nowadays, the number of instruction executed per CPU cycle is not a constant and depends on the code placement, cache efficiency, various timing, and so I personally prefer wall clock time. I designed pyperf to give users an idea of the performance that they will see on their machine. Not the performance on a server dedicated for benchmarks. Well, in practice, pyperf system tune disables Turbo Boost whereas applications using a single CPU can run faster. But the important part for me is not the absolute value, but the ratio when comparing performances of a change to a reference point. |
Outputting instruction counts doesn't prevent you using wall clock timings. |
If someone proposes a PR, I will review it and likely merge it ;-) As I wrote, pyperf design allows to store any number with an unit, integer or floating point number. |
It would be interesting to investigate use of instruction counts (through Linux's perf module and similar tools on other platforms to access hardware performance counters) within pyperf.
See, for example, Nicholas Nethercote's experience with monitoring rustc performance:
Perhaps in an interpreter where dispatch overhead and boxing/unboxing cost can be significant this won't hold true due to small changes having the potential to cause to a much more significant change in cache misses, but it would still be worthwhile to investigate in my view.
The text was updated successfully, but these errors were encountered: