gpu performance with EB #751
Replies: 8 comments 4 replies
-
Hi, thanks for reaching out! We have used EB pretty well on Frontier GPUs without issues. What type of GPUs are you using and do you have. a recent version of PeleC? We did many performance improvements in recent months. Also, can you post the profiling results (need to build with |
Beta Was this translation helpful? Give feedback.
-
Also check your version of PelePhysics if you are not using the submodule version control. Lots and lots of changes in PelePhysics. |
Beta Was this translation helpful? Give feedback.
-
Thank you for looking at it. I am using the latest code (downloaded a week ago). Here is the profile with EB-Converging nozzle with base grid 144 x 96 x 96 and max_level 3. This is with A100 CPU(0): Heap Space (bytes) used by Coalescing FAB Arena: 63704481792 this is for 288 x 192 x 192, max_level 3 (mesh count approx. 210M) |
Beta Was this translation helpful? Give feedback.
-
@indra098124 that shows the memory utilization, but when you run after compiling with TINY_PROFILE=TRUE, you should get some compute time profiling at the end of the terminal output for your simulation similar to this:
That's the profile for the default EB-ConvergingNozzle case running on CPU. It looks like state redistribution algorithm might be what's slowing things down, but please also run on your system to verify. For this simple case using GammaLaw EOS, you may be able to use flux redistribution ( @marchdf - thoughts on why redistribution is so slow here? Is it just that most cases we run have chemistry and that dwarfs redistribution cost? |
Beta Was this translation helpful? Give feedback.
-
On the GPU make sure to use |
Beta Was this translation helpful? Give feedback.
-
Thanks @baperry2 and @jrood-nrel for the suggestions and looking into this. This is my observation too that the redistribution takes very long. Our nodes are down for maintenance, but hopefully, I will get the new numbers by tomorrow. @baperry2 on CPU, I get a similar output, but with gpu with TINY_PROFILE TRUE, I only got the output that I posted. I will try with what @jrood-nrel suggested. @baperry2 may I ask if you have any guidelines/document that may give some suggestions on where flux_redistribution might be ok to use? |
Beta Was this translation helpful? Give feedback.
-
Here is the output of the profiler. disabled checkpoint and plotfile writing. Copied here the top rows to show the most expensive calls. Here the parallelCopy takes 50% of the time. This is EB-ConvergingNozzle case with max_level 3 and the base grid 288 x 192 x 192 TinyProfiler total time across processes [min...avg...max]: 405.9 ... 405.9 ... 405.9Name NCalls Excl. Min Excl. Avg Excl. Max Max %FabArray::ParallelCopy_finish() 636 94.63 153.8 200.6 49.43% Name NCalls Incl. Min Incl. Avg Incl. Max Max %main() 1 405.9 405.9 405.9 100.00% |
Beta Was this translation helpful? Give feedback.
-
@baperry2 I agree with you. This simulation with this configuration is an absolute overkill. I just used this simulation for testing the gpu performance and it was easy to communicate the details for this simulation. I went this way because with my other simulation I noticed that the simulations were noticeably slow with EB on gpu compared to a similar nonEB case. I tried with FluxRedistribution and the cost of ApplyMLRedistribution() went up from 1.24% for StateRedistribution to 5.36% for FluxRedistibution. Here are the numbers. Name NCalls Excl. Min Excl. Avg Excl. Max Max %FabArray::ParallelCopy_finish() 575 62.29 109.5 144.7 48.27% @marchdf I am using blocking_factor 8 and used max_grid 64. Tried with max_grid 128, the numbers did not change much. This is with stateredistribution, blocking_factor 8 and max_grid 128. TinyProfiler total time across processes [min...avg...max]: 382.9 ... 382.9 ... 382.9 Name NCalls Excl. Min Excl. Avg Excl. Max Max %FabArray::ParallelCopy_finish() 636 101.4 151.7 198.4 51.80% |
Beta Was this translation helpful? Give feedback.
-
Hi PeleC community,
what is your experience with gpu + EB? I am finding that with gpu EB code is very slow.
Beta Was this translation helpful? Give feedback.
All reactions