Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docs on system-provided MPI including HDF5 #1706

Merged
merged 11 commits into from
Nov 21, 2023

Conversation

JoshuaLampert
Copy link
Member

@JoshuaLampert JoshuaLampert commented Nov 4, 2023

In our docs on how to use a system-provided MPI installation, we were missing that in this case also a custom HDF5 library needs to be used. I updated the docs accordingly.

Copy link
Contributor

github-actions bot commented Nov 4, 2023

Review checklist

This checklist is meant to assist creators of PRs (to let them know what reviewers will typically look for) and reviewers (to guide them in a structured review process). Items do not need to be checked explicitly for a PR to be eligible for merging.

Purpose and scope

  • The PR has a single goal that is clear from the PR title and/or description.
  • All code changes represent a single set of modifications that logically belong together.
  • No more than 500 lines of code are changed or there is no obvious way to split the PR into multiple PRs.

Code quality

  • The code can be understood easily.
  • Newly introduced names for variables etc. are self-descriptive and consistent with existing naming conventions.
  • There are no redundancies that can be removed by simple modularization/refactoring.
  • There are no leftover debug statements or commented code sections.
  • The code adheres to our conventions and style guide, and to the Julia guidelines.

Documentation

  • New functions and types are documented with a docstring or top-level comment.
  • Relevant publications are referenced in docstrings (see example for formatting).
  • Inline comments are used to document longer or unusual code sections.
  • Comments describe intent ("why?") and not just functionality ("what?").
  • If the PR introduces a significant change or new feature, it is documented in NEWS.md.

Testing

  • The PR passes all tests.
  • New or modified lines of code are covered by tests.
  • New or modified tests run in less then 10 seconds.

Performance

  • There are no type instabilities or memory allocations in performance-critical parts.
  • If the PR intent is to improve performance, before/after time measurements are posted in the PR.

Verification

  • The correctness of the code was verified using appropriate tests.
  • If new equations/methods are added, a convergence test has been run and the results
    are posted in the PR.

Created with ❤️ by the Trixi.jl community.

@JoshuaLampert JoshuaLampert added the documentation Improvements or additions to documentation label Nov 4, 2023
Copy link
Member

@sloede sloede left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this. I am wondering (since I already forgot):

Could you add a small section with a rationale why all these need to be configured? For p4est and t8code it is probably easy (they initialize themselves with MPI function calls and thus MPI needs to work), while for HDF5 it is (to me) not immediately obvious.

docs/src/parallelization.md Outdated Show resolved Hide resolved
docs/src/parallelization.md Outdated Show resolved Hide resolved
docs/src/parallelization.md Outdated Show resolved Hide resolved
docs/src/parallelization.md Outdated Show resolved Hide resolved
Co-authored-by: Michael Schlottke-Lakemper <[email protected]>
@JoshuaLampert
Copy link
Member Author

JoshuaLampert commented Nov 5, 2023

Could you add a small section with a rationale why all these need to be configured?

You mean also if you don't use these libraries explicitly (otherwise it is clear because they call MPI functions and thus need to be compiled against the same MPI implementation, which is already explained in the docs)? For p4est and t8code I hope this will not be a problem anymore after having trixi-framework/P4est.jl#93 and DLR-AMR/T8code.jl#40. For HDF5, the following fails, when MPIPreferences are set, but not the HDF5.jl preference:

julia> using MPI

julia> using HDF5

julia> MPI.Init()
--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_shmem_base_select failed
  --> Returned value -1 instead of OPAL_SUCCESS
[...]

I think this is caused by the fact that the HDF5_jll is compiled with an MPI implementation that is not compatible with the implementation that is set by the MPIPreferences. using HDF5 loads HDF5_jll (since no HDF5.jl preferences are set) and therefore also the other MPI installation.

ranocha
ranocha previously approved these changes Nov 5, 2023
Copy link
Member

@ranocha ranocha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@ranocha ranocha requested a review from sloede November 6, 2023 06:34
@sloede
Copy link
Member

sloede commented Nov 6, 2023

@JoshuaLampert Thanks for the explanation above. Have you tried raising this as an issue with the HDF5 folks - maybe not to get a "fix", but at least a confirmation that this is known and understood behavior? If yes, maybe they would also be open to implementing a similar check with a warning.

@sloede
Copy link
Member

sloede commented Nov 6, 2023

You mean also if you don't use these libraries explicitly (otherwise it is clear because they call MPI functions and thus need to be compiled against the same MPI implementation, which is already explained in the docs)?

Yes, exactly. I think it would be helpful for users who do not want to use any of the packages to understand why they need to set these preferences if they want either no crashes (HDF5) or no warning messages (p4est, t8code).

@JoshuaLampert
Copy link
Member Author

JoshuaLampert commented Nov 7, 2023

Have you tried raising this as an issue with the HDF5 folks - maybe not to get a "fix", but at least a confirmation that this is known and understood behavior? If yes, maybe they would also be open to implementing a similar check with a warning.

This is known behavior and explained in a bit more detail in JuliaIO/HDF5.jl#1079. I asked for the possibility to add a warning. However, the setting for HDF5.jl is a bit different than we had for P4est.jl or T8code.jl since the error appears in the MPI.Init() and not in using P4est or using T8code. Nevertheless, one could add a warning if a system-provided MPI, but not a system-provided HDF5 is found, I think.
I added a short note to the docs in af508a6. I propose to wait until DLR-AMR/T8code.jl#40 is merged because I already included the changes done there in the docs here.

@ranocha
Copy link
Member

ranocha commented Nov 20, 2023

@sloede, @JoshuaLampert: Okay to be merged?

@JoshuaLampert
Copy link
Member Author

I would like to merge this together with #1745 because this PR describes the behavior with #1745 included.

@ranocha
Copy link
Member

ranocha commented Nov 21, 2023

Sounds good to me. Please ping me if we forget this PR when the other one is merged

@ranocha ranocha merged commit 9317f7c into trixi-framework:main Nov 21, 2023
6 checks passed
@JoshuaLampert JoshuaLampert deleted the docs-custom-mpi-hdf5 branch November 21, 2023 06:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants