Skip to content

Commit

Permalink
Don't save state for crashing MPI simulations
Browse files Browse the repository at this point in the history
  • Loading branch information
Sbozzolo committed Dec 22, 2023
1 parent 573f6fd commit 6971880
Showing 1 changed file with 7 additions and 2 deletions.
9 changes: 7 additions & 2 deletions src/solver/solve.jl
Original file line number Diff line number Diff line change
Expand Up @@ -68,8 +68,13 @@ function solve_atmos!(simulation)
return AtmosSolveResults(sol, :success, walltime)
end
catch ret_code
CA.save_restart_func(integrator, simulation.output_dir)
CA.save_to_disk_func(integrator, simulation.output_dir)
if !CA.is_distributed(comms_ctx)
# We can only save when not distributed because we don't have a way to sync the
# MPI processes (maybe just one MPI rank crashes, leading to a hanginging
# simulation)
CA.save_restart_func(integrator, simulation.output_dir)
CA.save_to_disk_func(integrator, simulation.output_dir)
end
@error "ClimaAtmos simulation crashed. Stacktrace for failed simulation" exception =
(ret_code, catch_backtrace())
return AtmosSolveResults(nothing, :simulation_crashed, nothing)
Expand Down

0 comments on commit 6971880

Please sign in to comment.