Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make MPI restart test more robust #3413

Merged
merged 1 commit into from
Nov 4, 2024
Merged

Make MPI restart test more robust #3413

merged 1 commit into from
Nov 4, 2024

Conversation

Sbozzolo
Copy link
Member

Sometimes the shared filesystem is slow and the folder is properly not synced across MPI processes. This commit adds an extra check to ensure that all the MPI processes see the tmpfolder before moving forward.

@Sbozzolo Sbozzolo force-pushed the gb/fixy branch 2 times, most recently from d9db8dd to 9dcafb6 Compare November 4, 2024 16:22
Sometimes the shared filesystem is slow and the folder is properly not
synced across MPI processes. This commit adds an extra check to ensure
that all the MPI processes see the tmpfolder before moving forward.
@Sbozzolo Sbozzolo added this pull request to the merge queue Nov 4, 2024
# Sometimes the shared filesystem doesn't work properly
# and the folder is not synced across MPI processes.
# Let's add an additional check here.
maybe_wait_filesystem(comms_ctx, output_loc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does maybe_wait_filesystem do exactly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It checks that all the MPI processes see that the temporary folder was created. If they don't agree, it waits a tiny bit and check again

Merged via the queue into main with commit a0e8612 Nov 4, 2024
16 checks passed
@Sbozzolo Sbozzolo deleted the gb/fixy branch November 4, 2024 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants