-
Dear AMReX-Combustion community, I was trying to run PeleLMeX with GPU on the HPC Setonix. I have started with the Flamesheet tutorial case, whose performance for GPU runs has been reported in the documentation of PeleLMeX. This case is successfully compiled with
I then requested a node to run it with
The simulation failed to initialise with a hipErrorOutOfMemory, indicated as:
I am wondering if there might have been any missed compilation flags or specific setup adjusted needed to avoid this issue. Has anyone encountered a similar problem? Any suggestions or shared experiences would be greatly appreciated! Jianhong |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 1 reply
-
@SreejithNREL - you've run on Setonix right? Did you encounter the issue described here? |
Beta Was this translation helpful? Give feedback.
-
By default, AMReX will try to reserve 3/4 of the available GPU memory for its Arena. Maybe something is preventing it from doing this on this platform. Can you try reducing that amount, using for instance: (8Gb in bites). |
Beta Was this translation helpful? Give feedback.
-
I have just tried to run this case on LUMI, which is very similar to Frontier and Setonix. The case ran with a couple of changes:
Are you using GPU-aware MPI ? I remember some issues with AMReX arena init size when using GPU-aware MPI on Frontier, so you'll probably need to request a smaller amount (unless the issue has been resolved) as described in the comment above. |
Beta Was this translation helpful? Give feedback.
-
Sorry for getting back to this a bit late. Thanks @esclapez I can now run the case in GPU on Setonix with the two settings. Much appreciated!
|
Beta Was this translation helpful? Give feedback.
I have just tried to run this case on LUMI, which is very similar to Frontier and Setonix. The case ran with a couple of changes:
Are you using GPU-aware MPI ? I remember some issues with AMReX arena init size when using GPU-aware MPI on Frontier, so you'll probably need to request a smaller amount (unless the issue has been resolved) as described in the comment above.