Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix GPU target pipeline nsight reports #3376

Merged
merged 1 commit into from
Oct 30, 2024
Merged

Conversation

charleskawczynski
Copy link
Member

@charleskawczynski charleskawczynski commented Oct 10, 2024

Closes #3375.

@charleskawczynski charleskawczynski force-pushed the ck/shorten_nsys_jobs branch 2 times, most recently from 0661e5f to 0846521 Compare October 14, 2024 21:58
Try with timed_solve

Try range-at-domain

Try with single quotes

Try using delay keyword

Shorten simulations

wip
@charleskawczynski charleskawczynski requested review from Sbozzolo and szy21 and removed request for Sbozzolo October 29, 2024 19:10
@charleskawczynski charleskawczynski changed the title Only capture nsys in solve-atmos Fix GPU target pipeline nsight reports Oct 29, 2024
@Sbozzolo
Copy link
Member

There's three changes in this PR. Which one fixed the issue? Or all the three of them required?

@charleskawczynski
Copy link
Member Author

There's three changes in this PR. Which one fixed the issue? Or all the three of them required?

Basically yes: the reports now work, but we're running out of memory. The memory requests fix those issues, but if you then try to download the files, they're huge. So I've also reduced the time, and added --delay to try and capture less of the trace (it should arguably be reduced further, but I'm at least now able to open files).

Copy link
Member

@Sbozzolo Sbozzolo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

I'd add some comments in the yml to explain why these flags are set (or a link to this issue)

@charleskawczynski charleskawczynski merged commit b9511aa into main Oct 30, 2024
12 of 14 checks passed
@charleskawczynski charleskawczynski deleted the ck/shorten_nsys_jobs branch October 30, 2024 14:42
@charleskawczynski
Copy link
Member Author

Not sure why some jobs broke on the build on main, but it previously passed: https://buildkite.com/clima/climaatmos-ci/builds/21046, so I think it's unrelated to this PR?

@Sbozzolo
Copy link
Member

Test restart MPI has been failing some times recently. I am not sure why. It's unrelated to this PR and we've seen it in other places too.

@szy21
Copy link
Member

szy21 commented Oct 30, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Some nsight jobs have error codes 139
3 participants