Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sporadic extreme slowdowns, possibly caused by Coulomb log discontinuity #198

Open
mikekryjak opened this issue Dec 22, 2023 · 2 comments
Open
Assignees
Labels
bug Something isn't working performance

Comments

@mikekryjak
Copy link
Collaborator

mikekryjak commented Dec 22, 2023

Thanks to Matt Khan for spotting this in his simulations and doing the initial debugging.

Our 1D STEP simulations in deep detachment (10m from target) slow down substantially upon front movement when very near steady state. This seems to appear only when the simulation has run for hundreds of ms to the point where each front movement is 100s of ms apart, and does not happen earlier in the simulation where the front easily jumps from cell to cell in the same domain region without issue.

Here is a figure showing the wall time spikes in blue, momentum spikes in green and front position in red:
image

The symptom is an extremely long iteration (10hrs vs. typical 3min on 40 cores) just before front movement.

Here is a decomposition of ddt() components around the time of the spike. The front movement happens in the final spike in residuals, but the wall time spike happens just before. There doesn't appear to be any helpful pattern to this:
image

I have plotted profiles at several timeslices and found nothing special about this particular front jump:
image
image
image

I have also plotted the domain integral histories of the atomic and impurity radiation reactions. It tells us nothing new apart from that the hydrogenic reactions are changing rapidly during the slowdown (but almost as rapidly before it and more rapidly afterwards). Reaction rates are highly nonlinear but do not feature a preconditioner, so they could cause poor performance. However, they move a lot every time the front jumps, and there is no issue in the simulation until far into the steady state.
image

Diagnosing CVODE shows a spike in the number of fails and a drop in the linear to nonlinear iteration ratio. This (I think) means that nonlinear iterations keep failing which makes CVODE reduce timestep to extremely low levels resulting in the very long iteration. The order drops to 1 so this isn't connected with CVODE's sporadic hogging of higher orders at low timesteps. I am not sure if there's anything in here that would warrant a change in solver settings.

image

I have done a test where I restarted the simulation from just before the spike to see if the issue is caused by the CVODE algorithm tuning its settings for the long quiescent period and then being "surprised" by the spike (thanks for the idea @bendudson). Unfortunately the spike happens in the same way as before. The next step is to significantly reduce timestep size and rerun the simulation to get a better handle of what dynamics are connected with the spike (current timestep is 5ms).

@mikekryjak mikekryjak added bug Something isn't working performance labels Dec 22, 2023
@mikekryjak mikekryjak self-assigned this Dec 22, 2023
@mikekryjak
Copy link
Collaborator Author

mikekryjak commented Apr 10, 2024

This could be caused by a non-monotonic jump in the coulomb Logarithm. Thanks to Stefan Mijin for identifying this in ReMKiT1D.

image

@mikekryjak
Copy link
Collaborator Author

mikekryjak commented Oct 3, 2024

Now fixed in ReMKiT1D. Just changed the 10 to e^2.

ukaea/ReMKiT1D@a6a17b8

@mikekryjak mikekryjak changed the title Sporadic extreme slowdowns in deep detachment after settling for very long time Sporadic extreme slowdowns, possibly caused by Coulomb log discontinuity Oct 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working performance
Projects
None yet
Development

No branches or pull requests

1 participant