Sporadic extreme slowdowns, possibly caused by Coulomb log discontinuity #198

mikekryjak · 2023-12-22T11:42:53Z

Thanks to Matt Khan for spotting this in his simulations and doing the initial debugging.

Our 1D STEP simulations in deep detachment (10m from target) slow down substantially upon front movement when very near steady state. This seems to appear only when the simulation has run for hundreds of ms to the point where each front movement is 100s of ms apart, and does not happen earlier in the simulation where the front easily jumps from cell to cell in the same domain region without issue.

Here is a figure showing the wall time spikes in blue, momentum spikes in green and front position in red:

The symptom is an extremely long iteration (10hrs vs. typical 3min on 40 cores) just before front movement.

Here is a decomposition of ddt() components around the time of the spike. The front movement happens in the final spike in residuals, but the wall time spike happens just before. There doesn't appear to be any helpful pattern to this:

I have plotted profiles at several timeslices and found nothing special about this particular front jump:

I have also plotted the domain integral histories of the atomic and impurity radiation reactions. It tells us nothing new apart from that the hydrogenic reactions are changing rapidly during the slowdown (but almost as rapidly before it and more rapidly afterwards). Reaction rates are highly nonlinear but do not feature a preconditioner, so they could cause poor performance. However, they move a lot every time the front jumps, and there is no issue in the simulation until far into the steady state.

Diagnosing CVODE shows a spike in the number of fails and a drop in the linear to nonlinear iteration ratio. This (I think) means that nonlinear iterations keep failing which makes CVODE reduce timestep to extremely low levels resulting in the very long iteration. The order drops to 1 so this isn't connected with CVODE's sporadic hogging of higher orders at low timesteps. I am not sure if there's anything in here that would warrant a change in solver settings.

I have done a test where I restarted the simulation from just before the spike to see if the issue is caused by the CVODE algorithm tuning its settings for the long quiescent period and then being "surprised" by the spike (thanks for the idea @bendudson). Unfortunately the spike happens in the same way as before. The next step is to significantly reduce timestep size and rerun the simulation to get a better handle of what dynamics are connected with the spike (current timestep is 5ms).

mikekryjak · 2024-04-10T13:07:22Z

This could be caused by a non-monotonic jump in the coulomb Logarithm. Thanks to Stefan Mijin for identifying this in ReMKiT1D.

mikekryjak · 2024-10-03T14:17:25Z

Now fixed in ReMKiT1D. Just changed the 10 to e^2.

ukaea/ReMKiT1D@a6a17b8

mikekryjak added bug Something isn't working performance labels Dec 22, 2023

mikekryjak self-assigned this Dec 22, 2023

mikekryjak changed the title ~~Sporadic extreme slowdowns in deep detachment after settling for very long time~~ Sporadic extreme slowdowns, possibly caused by Coulomb log discontinuity Oct 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sporadic extreme slowdowns, possibly caused by Coulomb log discontinuity #198

Sporadic extreme slowdowns, possibly caused by Coulomb log discontinuity #198

mikekryjak commented Dec 22, 2023 •

edited

Loading

mikekryjak commented Apr 10, 2024 •

edited

Loading

mikekryjak commented Oct 3, 2024 •

edited

Loading

Sporadic extreme slowdowns, possibly caused by Coulomb log discontinuity #198

Sporadic extreme slowdowns, possibly caused by Coulomb log discontinuity #198

Comments

mikekryjak commented Dec 22, 2023 • edited Loading

mikekryjak commented Apr 10, 2024 • edited Loading

mikekryjak commented Oct 3, 2024 • edited Loading

mikekryjak commented Dec 22, 2023 •

edited

Loading

mikekryjak commented Apr 10, 2024 •

edited

Loading

mikekryjak commented Oct 3, 2024 •

edited

Loading