Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate input output #629

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

Separate input output #629

wants to merge 2 commits into from

Conversation

sebastianbeyer
Copy link
Collaborator

Separating the inputs from the outputs. This should enable us to have more 'predictable' runs, in that running with the same inputs and the same config/namelists should lead to the same results.
We can save time because we do not need to wait during backuping the restart files (this is the main motivation in DestinE project). Still this is done so that the old default behavior is not changed.

It introduces a new namelist option RestartPath that by default is set to the string same-as-result. if that is the case, it will be set to the same value as ResultPath, so the default behavior is not changed.

I also want a separate namelist option for RestartOutPath or similar, so that the restarts can be completely uncoupled from the outputs.

So far, I have only tested raw restarts.

@sebastianbeyer sebastianbeyer self-assigned this Sep 20, 2024
@sebastianbeyer sebastianbeyer added the enhancement New feature or request label Sep 20, 2024
@sebastianbeyer
Copy link
Collaborator Author

sebastianbeyer commented Sep 22, 2024

Some questions:

  1. What about mesh.diag.nc and oce.blowup.nc? should it keep going to ResultPath?
  2. would it be helpful to encode the current date of the restarts in the file- or path name, same as for the netcdf restarts? It would then be possible to have a common restart directory for a whole run and the particular restarts for one chunk would be selected by the clock file.

@JanStreffing
Copy link
Collaborator

I think oce.blowup.nc can stay where it is and named as is. Should the last timestep always, which will be clear from logfile.

For mesh.diag.nc I can imagine a use case in the future with a moving cavity between restarts where time coding it might make sense. However, changing its name might break a few postprocessing scripts initially, which are relying on this file for mesh info.

@sebastianbeyer
Copy link
Collaborator Author

Ah, I meant the time coding for the raw and bin restart files, not mesh.diag or blowup.
but I thought again and that will pose a problem when the clock file is also in the RestartOutPath and we want to retrieve the folder to use to restart from there we get a chicken egg problem ;)
Would like to get rid of the clock file in general, btw…

Sebastian Beyer added 2 commits September 24, 2024 18:38
Separating the inputs from the outputs. This should enable us to have
more 'predictable' runs, in that running with the same inputs and the
same config/namelists should lead to the same results.
We can save time because we do not need to wait during backuping the
restart files (this is the main motivation in DestinE project).
Still this is done so that the old default behaviour is not changed.
@sebastianbeyer
Copy link
Collaborator Author

What would be your preference with the clock file(s)? keep it in one central place (ResultPath or working directory (where namelist files are) )? so it will be always updated with the current date? I think that is how they were intended, but that does not go too well with the idea of 'same config does always produce same runs'... it contains state that somehow needs to be managed.

@JanStreffing JanStreffing added this to the FESOM 2.6.1 milestone Sep 28, 2024
@sebastianbeyer
Copy link
Collaborator Author

Since I discussed this a little bit with @JanStreffing , we thought that it might be better if we don't separate the input and output through namelist options, but rather encode into the path for the restart files the date of the restarts, so it would always be clear how they are being written and they would never be overwritten. So it would not be e.g.
trim(ResultPath)//"fesom_raw_restart/np"//int_to_txt(partit%npes)
but
trim(ResultPath)//"fesom_raw_restart/"//date//"_np"//int_to_txt(partit%npes)
where date would be e.g. 20200114, so the path for the resulting restarts would be
./fesom_raw_restart/20200114_np512.
We also discussed to remove the check on the time step in the restarts, so that any restart file can be used for any run, as long as it is in the correct path (determined by the clock file). This applies to the netcdf restarts, not raw restarts, btw.

I would still like to keep this PR as a base, because I think that the logic for reading and writing is a bit more clear now with a separate function for reading and for writing. What do you think?

@patrickscholz
Copy link
Contributor

I see there one small issue: when you do something like
trim(ResultPath)//"fesom_raw_restart/"//date//"_np"//int_to_txt(partit%npes)
where date would be e.g. 20200114, so the path for the resulting restarts would be

with year|mon|day you always assume that model finishes an entire day, but especially for debugging purposes you might want to make an restart at an specific time point before the model blows up which does not correspond to a full day in that case the restart might not fully work especially if we want to get rid of the clock files? In that case it might be maybe smart to use instead year|mon|day, to use something like year|secondsinyear to describe the restart moment although it might be less convenient to be humanly read. Or maybe even do year|mon|day|secondsinday.

@sebastianbeyer
Copy link
Collaborator Author

Good point! For our runs we usually don't do so short runs, but for debugging it makes sense. I would prefer your second option year|mon|day|secondsinday, because secondsinyear is just too big for me to understand ;)

About the clock files, I also discussed that a little bit with Jan and I can now see their value (even if I don't really love them ;) ). Somewhere the model needs to keep track of where it currently is in time. Without clock files we would need to give that information in the namelists (e.g. always give startdate and runlength), but that then would require to change the namelist after every chunk of a long run. I mean, that's totally possible and with a workflow manager like ESMtools it would be easy, but for quick standalone runs it might be easier to stay with the clock file... Both solutions have pros and cons, if this would be starting from scratch I would prefer to the solution via namelist (and/or command line option), but in the end both are fine, I guess? Have you had some discussions about this already between FESOM developers?

For the clockfiles I would still like to get rid of the first line though, I don't see why we need to have both time periods. If the time step that was used in the previous step is important (maybe for multi-step numerical methods, no idea if that is used??), we should just give that explicitly, resulting in a clock file like this:

240 84600.0 365 1958

Or maybe the other way round?

1958 365 84600 240

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants