Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a utility class for Relative Error compute. #1446

Merged
merged 14 commits into from
Nov 4, 2024

Conversation

sarathnandu
Copy link
Contributor

Description

Added a utility class for Relative Error compute.
Added relative error compute for seismic example.

Fixes # - issue number(s) if exists

Type of change

Choose one or multiple, leave empty if none of the other choices apply

Add a respective label(s) to PR if you have permissions

  • bug fix - change that fixes an issue
  • [x ] new feature - change that adds functionality
  • tests - change in tests
  • infrastructure - change in infrastructure and CI
  • documentation - documentation update

Tests

  • added - required for new features and some bug fixes
  • [ x] not needed

Documentation

  • updated in # - add PR number
  • needs to be updated
  • [x ] not needed

Breaks backward compatibility

  • Yes
  • [x ] No
  • Unknown

Notify the following users

List users with @ to send notifications

Other information

@sarathnandu sarathnandu marked this pull request as draft July 13, 2024 02:07
@sarathnandu sarathnandu marked this pull request as ready for review August 1, 2024 19:19
Copy link
Contributor

@aleksei-fedotov aleksei-fedotov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we need to continue measurements until one of the following conditions are met:

  1. Relative error is within the acceptable limit
  2. Overall limit on running time is reached.

examples/parallel_for/seismic/main.cpp Outdated Show resolved Hide resolved
Comment on lines 375 to 377
std::chrono::duration<double> duration = _endTime - _startTime;
// store the duration in seconds
_secPerFrame.push_back(duration.count());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually, it makes sense to introduce as little as possible overhead during the measurements to avoid influence the the computations being measured. So, consider pushing the postprocessing step to a later stage. Here only store _startTime and _endTime, and post process them in the independent from the main computations part, e.g., computeRelError.

Copy link
Contributor Author

@sarathnandu sarathnandu Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Aleksei, for bringing this point. I am not sure which approach is less intrusive.
Currently I have only one store with a subtract instruction. In the approach you suggested we need to 2 vector stores in the measurement loop. I was under the impression subtract + 1 store is less intrusive, please correct me if I am wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case I think I should reserve some memory for the vector stores based on the iteration count. I will update the constructor accordingly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct! We want to have the impact from the measurement system itself as little as possible. Therefore, we need to consider implementing at least the following:

  • Having the memory pre-allocated. I guess just need to add a parameter to the constructor.
    • Are there cases when the number of iterations is not known beforehand?
  • Do all the calculations only after the measurement loop is completed. This means inside the measurement loop can only be stores of the retrieved timings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified the constructor to accept a parameter for number of iterations and reserve the memory accordingly. All calculations are done after the measurement loop is completed.

@sarathnandu
Copy link
Contributor Author

I believe we need to continue measurements until one of the following conditions are met:

  1. Relative error is within the acceptable limit
  2. Overall limit on running time is reached.

Yes, that's the intent. The relative error may also vary from platform to platform for the same iteration count. In my experiments, for certain benchmarks, I have seen Relative error more stable on client systems whereas on high core count NUMA systems it sometimes increases with iteration.

The Performance CI scripts will utilize the "Relative_Err" string to adjust the iteration count to minimize the relative error in respective systems.

Comment on lines 375 to 377
std::chrono::duration<double> duration = _endTime - _startTime;
// store the duration in seconds
_secPerFrame.push_back(duration.count());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct! We want to have the impact from the measurement system itself as little as possible. Therefore, we need to consider implementing at least the following:

  • Having the memory pre-allocated. I guess just need to add a parameter to the constructor.
    • Are there cases when the number of iterations is not known beforehand?
  • Do all the calculations only after the measurement loop is completed. This means inside the measurement loop can only be stores of the retrieved timings.

examples/common/utility/utility.hpp Outdated Show resolved Hide resolved
examples/parallel_for/seismic/main.cpp Outdated Show resolved Hide resolved
examples/common/utility/utility.hpp Outdated Show resolved Hide resolved
@aleksei-fedotov
Copy link
Contributor

I believe we need to continue measurements until one of the following conditions are met:

  1. Relative error is within the acceptable limit
  2. Overall limit on running time is reached.

Yes, that's the intent. The relative error may also vary from platform to platform for the same iteration count. In my experiments, for certain benchmarks, I have seen Relative error more stable on client systems whereas on high core count NUMA systems it sometimes increases with iteration.

The Performance CI scripts will utilize the "Relative_Err" string to adjust the iteration count to minimize the relative error in respective systems.

Do I understand correctly that CI scripts will keep re-starting the measurements each time specifying increased number of iterations to run? So that they work as if in the following steps:

  1. Set num_iterations to previously saved value (see step 3).
  2. Run the benchmark and get the relative error computed.
  3. If it is within the per benchmark pre-defined limit, save that number of iterations. The system will start from that value in the next measurement session (see step 1).
  4. If it is not within the per benchmark pre-defined limit, then:
    • If overall per benchmark measurement session time is acceptable, increase the num_iterations (for example, double it) and go to the step 2.
    • If overall time is not acceptable, stop and report error saying that the measurement session is unstable, results are not reliable and that further analysis or re-consideration of either system or benchmark is required.

We also, can add step 3.1 that before saving the num_iterations will try decreasing it a little bit and see if still produces acceptable value of relative error. This way, we could make the system that will automatically and dynamically adjusts to the value of num_iterations closest to the one that gives stable relative error within defined range.

@sarathnandu
Copy link
Contributor Author

sarathnandu commented Oct 28, 2024

I believe we need to continue measurements until one of the following conditions are met:

  1. Relative error is within the acceptable limit
  2. Overall limit on running time is reached.

Yes, that's the intent. The relative error may also vary from platform to platform for the same iteration count. In my experiments, for certain benchmarks, I have seen Relative error more stable on client systems whereas on high core count NUMA systems it sometimes increases with iteration.
The Performance CI scripts will utilize the "Relative_Err" string to adjust the iteration count to minimize the relative error in respective systems.

Do I understand correctly that CI scripts will keep re-starting the measurements each time specifying increased number of iterations to run? So that they work as if in the following steps:

  1. Set num_iterations to previously saved value (see step 3).

  2. Run the benchmark and get the relative error computed.

  3. If it is within the per benchmark pre-defined limit, save that number of iterations. The system will start from that value in the next measurement session (see step 1).

  4. If it is not within the per benchmark pre-defined limit, then:

    • If overall per benchmark measurement session time is acceptable, increase the num_iterations (for example, double it) and go to the step 2.
    • If overall time is not acceptable, stop and report error saying that the measurement session is unstable, results are not reliable and that further analysis or re-consideration of either system or benchmark is required.

We also, can add step 3.1 that before saving the num_iterations will try decreasing it a little bit and see if still produces acceptable value of relative error. This way, we could make the system that will automatically and dynamically adjusts to the value of num_iterations closest to the one that gives stable relative error within defined range.

Yes this workflow needs to be implemented in the CI perf runner script to adjust the number of iterations until the relative error is reached within acceptable limits for the example benchmark. The intent is to improve the stability of measurements from run to run so that perf report geomean results are more reliable.

Copy link
Contributor

@aleksei-fedotov aleksei-fedotov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More remarks to consider.

examples/parallel_for/seismic/main.cpp Outdated Show resolved Hide resolved
Comment on lines 380 to 383
if (0 == _time_intervals.size()) {
std::cout << "No time samples collected \n";
return 0;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is related to reporting and processing of an error. I would decouple these two parts as measurements class should mostly be responsible for taking measurements rather then detecting and reporting errors. I think at most we could have throwing of an std::domain_error exception here or just leave an assert here that will fire in the debug mode checking the vector's size, but result in NaN computation in release, which is okay, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, modified to use assert that will fire in debug mode.

examples/common/utility/utility.hpp Outdated Show resolved Hide resolved
Comment on lines +390 to +392
return total + std::chrono::duration_cast<std::chrono::microseconds>(
interval.second - interval.first)
.count();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would be better to optimize by computing the std::chrono::duration_cast<std::chrono::microseconds>(interval.second - interval.first).count() only once and reuse the result below. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storing the duration at each time point in a vector and reading it back for accumulation and standard deviation calculations increases memory usage. In contrast, calculating the duration on the fly for these functions is more computationally complex but avoids additional memory overhead. Since this code is not on the critical path, I’m fine with the current approach. However, if you think storing durations in a vector for later access is preferable, I can switch to that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see myself any preference right now. That's why I asked for you opinion. Seems like you are also not sure what would be the best approach here. Therefore, I don't think we need to change this part at least now. So, let's have it as it is now.

@aleksei-fedotov
Copy link
Contributor

this workflow needs to be implemented in the CI perf runner script to adjust the number of iterations until the relative error is reached within acceptable limits for the example benchmark.

Don't you think that this better be implemented here instead? At least, it won't require restarting the whole benchmark for each new trial of measurements, which might be relatively complex process for some of them.

@sarathnandu
Copy link
Contributor Author

Don't you think that this better be implemented here instead? At least, it won't require restarting the whole benchmark for each new trial of measurements, which might be relatively complex process for some of them.

We would like this feature to be available as part of performance CI and extend it to all benchmarks that are part of the geomean calculation. Integrating this feature into the performance CI enables benchmark modifications to minimal (specify iteration counts and output relative error directly). For benchmarks where source code access is unavailable, the script can leverage the application runtime to compute relative error.

Copy link
Contributor

@aleksei-fedotov aleksei-fedotov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the patch is mostly done. Consider the last bunch of comments that are targeted to improve reliability of the patch.

examples/common/utility/utility.hpp Show resolved Hide resolved
examples/common/utility/utility.hpp Outdated Show resolved Hide resolved
Comment on lines +390 to +392
return total + std::chrono::duration_cast<std::chrono::microseconds>(
interval.second - interval.first)
.count();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see myself any preference right now. That's why I asked for you opinion. Seems like you are also not sure what would be the best approach here. Therefore, I don't think we need to change this part at least now. So, let's have it as it is now.

examples/parallel_for/seismic/main.cpp Outdated Show resolved Hide resolved
examples/parallel_for/seismic/main.cpp Outdated Show resolved Hide resolved
examples/parallel_for/seismic/main.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@aleksei-fedotov aleksei-fedotov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@sarathnandu sarathnandu merged commit 73ccfb5 into master Nov 4, 2024
25 checks passed
@sarathnandu sarathnandu deleted the dev/sarathna/relativeerror branch November 4, 2024 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants