-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added a utility class for Relative Error compute. #1446
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we need to continue measurements until one of the following conditions are met:
- Relative error is within the acceptable limit
- Overall limit on running time is reached.
examples/common/utility/utility.hpp
Outdated
std::chrono::duration<double> duration = _endTime - _startTime; | ||
// store the duration in seconds | ||
_secPerFrame.push_back(duration.count()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually, it makes sense to introduce as little as possible overhead during the measurements to avoid influence the the computations being measured. So, consider pushing the postprocessing step to a later stage. Here only store _startTime
and _endTime
, and post process them in the independent from the main computations part, e.g., computeRelError
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Aleksei, for bringing this point. I am not sure which approach is less intrusive.
Currently I have only one store with a subtract instruction. In the approach you suggested we need to 2 vector stores in the measurement loop. I was under the impression subtract + 1 store is less intrusive, please correct me if I am wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case I think I should reserve some memory for the vector stores based on the iteration count. I will update the constructor accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct! We want to have the impact from the measurement system itself as little as possible. Therefore, we need to consider implementing at least the following:
- Having the memory pre-allocated. I guess just need to add a parameter to the constructor.
- Are there cases when the number of iterations is not known beforehand?
- Do all the calculations only after the measurement loop is completed. This means inside the measurement loop can only be stores of the retrieved timings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified the constructor to accept a parameter for number of iterations and reserve the memory accordingly. All calculations are done after the measurement loop is completed.
Yes, that's the intent. The relative error may also vary from platform to platform for the same iteration count. In my experiments, for certain benchmarks, I have seen Relative error more stable on client systems whereas on high core count NUMA systems it sometimes increases with iteration. The Performance CI scripts will utilize the "Relative_Err" string to adjust the iteration count to minimize the relative error in respective systems. |
examples/common/utility/utility.hpp
Outdated
std::chrono::duration<double> duration = _endTime - _startTime; | ||
// store the duration in seconds | ||
_secPerFrame.push_back(duration.count()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct! We want to have the impact from the measurement system itself as little as possible. Therefore, we need to consider implementing at least the following:
- Having the memory pre-allocated. I guess just need to add a parameter to the constructor.
- Are there cases when the number of iterations is not known beforehand?
- Do all the calculations only after the measurement loop is completed. This means inside the measurement loop can only be stores of the retrieved timings.
Do I understand correctly that CI scripts will keep re-starting the measurements each time specifying increased number of iterations to run? So that they work as if in the following steps:
We also, can add step 3.1 that before saving the |
Add changes to store start and end time interval and postporcess. Co-authored-by: Aleksei Fedotov <[email protected]>
Yes this workflow needs to be implemented in the CI perf runner script to adjust the number of iterations until the relative error is reached within acceptable limits for the example benchmark. The intent is to improve the stability of measurements from run to run so that perf report geomean results are more reliable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More remarks to consider.
examples/common/utility/utility.hpp
Outdated
if (0 == _time_intervals.size()) { | ||
std::cout << "No time samples collected \n"; | ||
return 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is related to reporting and processing of an error. I would decouple these two parts as measurements
class should mostly be responsible for taking measurements rather then detecting and reporting errors. I think at most we could have throwing of an std::domain_error
exception here or just leave an assert here that will fire in the debug mode checking the vector's size, but result in NaN
computation in release, which is okay, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense, modified to use assert that will fire in debug mode.
return total + std::chrono::duration_cast<std::chrono::microseconds>( | ||
interval.second - interval.first) | ||
.count(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would be better to optimize by computing the std::chrono::duration_cast<std::chrono::microseconds>(interval.second - interval.first).count()
only once and reuse the result below. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Storing the duration at each time point in a vector and reading it back for accumulation and standard deviation calculations increases memory usage. In contrast, calculating the duration on the fly for these functions is more computationally complex but avoids additional memory overhead. Since this code is not on the critical path, I’m fine with the current approach. However, if you think storing durations in a vector for later access is preferable, I can switch to that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see myself any preference right now. That's why I asked for you opinion. Seems like you are also not sure what would be the best approach here. Therefore, I don't think we need to change this part at least now. So, let's have it as it is now.
Don't you think that this better be implemented here instead? At least, it won't require restarting the whole benchmark for each new trial of measurements, which might be relatively complex process for some of them. |
We would like this feature to be available as part of performance CI and extend it to all benchmarks that are part of the geomean calculation. Integrating this feature into the performance CI enables benchmark modifications to minimal (specify iteration counts and output relative error directly). For benchmarks where source code access is unavailable, the script can leverage the application runtime to compute relative error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the patch is mostly done. Consider the last bunch of comments that are targeted to improve reliability of the patch.
return total + std::chrono::duration_cast<std::chrono::microseconds>( | ||
interval.second - interval.first) | ||
.count(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see myself any preference right now. That's why I asked for you opinion. Seems like you are also not sure what would be the best approach here. Therefore, I don't think we need to change this part at least now. So, let's have it as it is now.
Co-authored-by: Aleksei Fedotov <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
Description
Added a utility class for Relative Error compute.
Added relative error compute for seismic example.
Fixes # - issue number(s) if exists
Type of change
Choose one or multiple, leave empty if none of the other choices apply
Add a respective label(s) to PR if you have permissions
Tests
Documentation
Breaks backward compatibility
Notify the following users
List users with
@
to send notificationsOther information