You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks to Matt Khan for spotting this in his simulations and doing the initial debugging.
Our 1D STEP simulations in deep detachment (10m from target) slow down substantially upon front movement when very near steady state. This seems to appear only when the simulation has run for hundreds of ms to the point where each front movement is 100s of ms apart, and does not happen earlier in the simulation where the front easily jumps from cell to cell in the same domain region without issue.
Here is a figure showing the wall time spikes in blue, momentum spikes in green and front position in red:
The symptom is an extremely long iteration (10hrs vs. typical 3min on 40 cores) just before front movement.
Here is a decomposition of ddt() components around the time of the spike. The front movement happens in the final spike in residuals, but the wall time spike happens just before. There doesn't appear to be any helpful pattern to this:
I have plotted profiles at several timeslices and found nothing special about this particular front jump:
I have also plotted the domain integral histories of the atomic and impurity radiation reactions. It tells us nothing new apart from that the hydrogenic reactions are changing rapidly during the slowdown (but almost as rapidly before it and more rapidly afterwards). Reaction rates are highly nonlinear but do not feature a preconditioner, so they could cause poor performance. However, they move a lot every time the front jumps, and there is no issue in the simulation until far into the steady state.
Diagnosing CVODE shows a spike in the number of fails and a drop in the linear to nonlinear iteration ratio. This (I think) means that nonlinear iterations keep failing which makes CVODE reduce timestep to extremely low levels resulting in the very long iteration. The order drops to 1 so this isn't connected with CVODE's sporadic hogging of higher orders at low timesteps. I am not sure if there's anything in here that would warrant a change in solver settings.
I have done a test where I restarted the simulation from just before the spike to see if the issue is caused by the CVODE algorithm tuning its settings for the long quiescent period and then being "surprised" by the spike (thanks for the idea @bendudson). Unfortunately the spike happens in the same way as before. The next step is to significantly reduce timestep size and rerun the simulation to get a better handle of what dynamics are connected with the spike (current timestep is 5ms).
The text was updated successfully, but these errors were encountered:
mikekryjak
changed the title
Sporadic extreme slowdowns in deep detachment after settling for very long time
Sporadic extreme slowdowns, possibly caused by Coulomb log discontinuity
Oct 3, 2024
Thanks to Matt Khan for spotting this in his simulations and doing the initial debugging.
Our 1D STEP simulations in deep detachment (10m from target) slow down substantially upon front movement when very near steady state. This seems to appear only when the simulation has run for hundreds of ms to the point where each front movement is 100s of ms apart, and does not happen earlier in the simulation where the front easily jumps from cell to cell in the same domain region without issue.
Here is a figure showing the wall time spikes in blue, momentum spikes in green and front position in red:
The symptom is an extremely long iteration (10hrs vs. typical 3min on 40 cores) just before front movement.
Here is a decomposition of ddt() components around the time of the spike. The front movement happens in the final spike in residuals, but the wall time spike happens just before. There doesn't appear to be any helpful pattern to this:
I have plotted profiles at several timeslices and found nothing special about this particular front jump:
I have also plotted the domain integral histories of the atomic and impurity radiation reactions. It tells us nothing new apart from that the hydrogenic reactions are changing rapidly during the slowdown (but almost as rapidly before it and more rapidly afterwards). Reaction rates are highly nonlinear but do not feature a preconditioner, so they could cause poor performance. However, they move a lot every time the front jumps, and there is no issue in the simulation until far into the steady state.
Diagnosing CVODE shows a spike in the number of fails and a drop in the linear to nonlinear iteration ratio. This (I think) means that nonlinear iterations keep failing which makes CVODE reduce timestep to extremely low levels resulting in the very long iteration. The order drops to 1 so this isn't connected with CVODE's sporadic hogging of higher orders at low timesteps. I am not sure if there's anything in here that would warrant a change in solver settings.
I have done a test where I restarted the simulation from just before the spike to see if the issue is caused by the CVODE algorithm tuning its settings for the long quiescent period and then being "surprised" by the spike (thanks for the idea @bendudson). Unfortunately the spike happens in the same way as before. The next step is to significantly reduce timestep size and rerun the simulation to get a better handle of what dynamics are connected with the spike (current timestep is 5ms).
The text was updated successfully, but these errors were encountered: