Odd scaling and memory behaviour using boomerAMG #3712

MaxOkraschevski · 2024-07-29T13:57:37Z

MaxOkraschevski
Jul 29, 2024

Dear Firedrakers,

I develop transport models for electrochemical applications with Firedrake now for a few months.
However, when I started last month to go from small-scale sequential runs (2D) to larger-scale parallel runs (3D with extruded meshes) I encountered very odd behaviour with MPI parallelization using boomerAMG as preconditioner.
Since I was not sure whether I screwed something up in the compilation process on our HPC system (closely followed this description from Delft: https://doc.dhpc.tudelft.nl/delftblue/howtos/firedrake/), I decided to reproduce a strong scaling plot from the following HPC Jupyter Notebook (the "CG+GAMG" variant in the diagram above the section "Matrix free and telescoping") : https://nbviewer.org/github/firedrakeproject/firedrake/blob/master/docs/notebooks/12-HPC_demo.ipynb

As you can see in my attached diagram the runtimes on our HPC system "BwUniCluster 2.0" reproduce the runtimes on the Archer2 machine. From this result I started to change the solution procedure towards the methods I need for my applications. First the tet-grid and the 2nd order CG polynomial (~ 30 mio. DoF) were replaced by an extruded hex-grid with DG0 (~ 33 mio. cells), still giving roughly linear scaling. Same happens for the exchange of CG with GMRES. However, as soon as I exhange GAMG by boomerAMG I merely get a scaling plateau and the 1024 core case is crashing due to the memory requirements (same behaviour for CG and GMRES). Hence, I decided to due some memory profiling (cumulated) on the intranode level for same physical problem with ~2 mio. cell/DoF.

What I find odd about the profiles is that is has these charateristic memory peaks before two consecutive plateaus. Roughly measuring the max. amplitude of those peaks compared the plateaus, you end up with 6GB for the 8 core case and 42GB for the 32 core case, indicating overproportional increasing memory requirements. This might explain why my bigger 1024 core case above crashes.

However, the other oddity I encounter is that on the intranode level there is some scaling again as indicated by the runtimes in the memory profiler, which let me think/guess that the former memory peaks cannot be the sole reason why there is no scaling on the internode level.

Summarized, my questions are:

(i) Would you expect such a strong overpropotional scaling in the memory requirement using boomerAMG?
(ii) Are those memory peaks (partly) responsibel for the bad scaling on the internode level?
(iii) Could it be that all this behaviour is anticipated?

Hoping for help! Best,
Max

connorjward · 2024-07-30T14:01:09Z

connorjward
Jul 30, 2024
Maintainer

Questions about specific solvers are probably best asked to the PETSc devs (eg). This may well be a regression in boomerAMG.

It would be informative to see a profile of the run to see where most of the runtime is spent. I recommend producing a flame graph following these instructions.

0 replies

wence- · 2024-07-30T15:32:34Z

wence-
Jul 30, 2024
Maintainer

For reasons, at least last time I checked, boomerAMGs default parameters were tuned to give good coarse grids on 2-D problems. You usually need to do a bunch of fiddling with parameters to get good coarsening for 3-D problems.

Listing 1 of https://arxiv.org/pdf/1501.01809 shows what we found to be good settings in 3D on tetrahedra at low order a decade ago. Certainly you still want to increase the default strong_threshold to closer to 0.7 than the default of 0.25. See also the MOOSE documentation https://mooseframework.inl.gov/releases/moose/v1.0.0/application_development/hypre.html

0 replies

MaxOkraschevski · 2024-08-01T09:02:28Z

MaxOkraschevski
Aug 1, 2024
Author

Thank you to you both for the quick and helpful response!

@connorjward: I will consider your advice regarding specific solvers next time. I actually produced a flame graph for one of the smaller working setups, however, I was not really sure what to expect in terms of the work split of the different solver stages. At least for this smaller problem no part was consuming almost all of the work load. But considering @wence- answer/advice I guess I would have seen something different for the larger problematic cases.

@wence- : Thank you for this recommendation. Indeed I sticked to the default values which seemed to cause these problems. Repeating the scaling tests with pc_hypre_boomeramg_stong_threshold=0.7 gives the anticipated results.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Odd scaling and memory behaviour using boomerAMG #3712

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Odd scaling and memory behaviour using boomerAMG #3712

MaxOkraschevski Jul 29, 2024

Replies: 3 comments

connorjward Jul 30, 2024 Maintainer

wence- Jul 30, 2024 Maintainer

MaxOkraschevski Aug 1, 2024 Author

MaxOkraschevski
Jul 29, 2024

connorjward
Jul 30, 2024
Maintainer

wence-
Jul 30, 2024
Maintainer

MaxOkraschevski
Aug 1, 2024
Author