Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to main.cpp #186

Merged
merged 26 commits into from
May 26, 2024
Merged

Improvements to main.cpp #186

merged 26 commits into from
May 26, 2024

Conversation

gonzalobg
Copy link
Contributor

@gonzalobg gonzalobg commented Mar 8, 2024

This PR gives a pass to main.cpp to make it much easier to add newer benchmarks in the future.

Main changes are:

  • --only option enables specifying different groups or individual benchmarks, e.g., --only Classic runs the classic 5 benchmarks, --only All runs all benchmarks (e.g. including nstream), and --only Triad runs only triad, etc.
  • --triad-only/--nstream-only options removed (replaced by --only Triad and --only Nstream).
  • --only Triad now runs triad in the exact same way as all other benchmarks, i.e., timing each individual iteration. Before, the outer loop was being timed.
  • --gigabytes and similar option print the bandwidths in GB/s (or GiB/s, MB/s, MiB/s)

Bug fixes:

  • Metrics for Init were incorrect; the timers from Read were incorrectly used for both Init and Read.

Future:

  • After these changes, adding new benchmarks like Scan is pretty straightforward: bump benchmark count by 1, and add them to all places in the code that fail (runner, checker, and that's it..).

Copy link
Member

@tom91136 tom91136 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nitpicks and one major change (no using namespace std;)

src/main.cpp Outdated Show resolved Hide resolved
CMakeLists.txt Show resolved Hide resolved
src/main.cpp Outdated Show resolved Hide resolved
src/main.cpp Outdated Show resolved Hide resolved
src/main.cpp Outdated Show resolved Hide resolved
src/Unit.h Outdated Show resolved Hide resolved
src/Unit.h Outdated Show resolved Hide resolved
src/Unit.h Outdated Show resolved Hide resolved
src/main.cpp Outdated Show resolved Hide resolved
src/main.cpp Outdated Show resolved Hide resolved
@gonzalobg
Copy link
Contributor Author

gonzalobg commented Mar 10, 2024

EDIT: will do this in a sub-sequent PR using intptr_t

I've noticed while testing on GPUs with more than 104 GB of memory that the results were "wrong".
The problem is that with int array_size, the largest array is 2^31 * 8 * 1e-9 = ~17.2 GB (51.6 GB with 3 arrays). With unsigned int it was ~104 GB. When trying to use larger arrays, validation would fail due to overflow.

Fixing this required using size_t in a couple of places. I've verified this for cuda, thrust, std-par, openmp, etc. I've turned the check solutions "on by default", adding a --silence-errors flag that one can pass to continue the benchmark even if validation fails.

src/Unit.h Show resolved Hide resolved
src/Unit.h Show resolved Hide resolved
src/Unit.h Show resolved Hide resolved
src/main.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@tomdeakin tomdeakin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the refactor, @gonzalobg! This is looking great - just the ordering of the execution to invert in two places and then I think it's good to go.

src/main.cpp Outdated Show resolved Hide resolved
src/main.cpp Outdated Show resolved Hide resolved
src/main.cpp Outdated Show resolved Hide resolved
CMakeLists.txt Show resolved Hide resolved
src/main.cpp Show resolved Hide resolved
src/main.cpp Outdated Show resolved Hide resolved
src/main.cpp Outdated Show resolved Hide resolved
@gonzalobg
Copy link
Contributor Author

@tomdeakin finished the changes :)

Copy link
Contributor

@tomdeakin tomdeakin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ordering LGMT now - thanks.
Just some weird mixed tab/space issues causes misaligned indenting so we need to run a tabs->spaces conversion to fix that.

src/main.cpp Outdated Show resolved Hide resolved
src/main.cpp Outdated Show resolved Hide resolved
src/main.cpp Outdated Show resolved Hide resolved
src/main.cpp Outdated Show resolved Hide resolved
src/main.cpp Outdated Show resolved Hide resolved
src/main.cpp Outdated Show resolved Hide resolved
src/main.cpp Outdated Show resolved Hide resolved
gonzalobg and others added 2 commits May 26, 2024 11:15
Co-authored-by: Tom Deakin <[email protected]>
@gonzalobg
Copy link
Contributor Author

Ordering LGMT now - thanks. Just some weird mixed tab/space issues causes misaligned indenting so we need to run a tabs->spaces conversion to fix that.

@tomdeakin fixed.

@tomdeakin tomdeakin merged commit 01224e7 into UoB-HPC:develop May 26, 2024
5 checks passed
@tomdeakin
Copy link
Contributor

Thanks @gonzalobg !

@gonzalobg gonzalobg deleted the prepare_scan branch May 26, 2024 10:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants