-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up loading for ACCESS-ESM non-CMOR datasets #2487
Comments
Hi @bouweandela, @valeriupredoi, We are encountering an issue with the output of the ACCESS-ESM model, specifically with the atmospheric data. The data is stored as follows: ./atm/netCDF/:
HI-CN-05.pa-185001_mon.nc
HI-CN-05.pa-185002_mon.nc
HI-CN-05.pa-185003_mon.nc
HI-CN-05.pa-185004_mon.nc
HI-CN-05.pa-185005_mon.nc
HI-CN-05.pa-185006_mon.nc
HI-CN-05.pa-185007_mon.nc
HI-CN-05.pa-185008_mon.nc
HI-CN-05.pa-185009_mon.nc
HI-CN-05.pa-185010_mon.nc All monthly variables are stored in a single netCDF file. Currently, our ACCESS:
cmor_strict: false
input_dir:
default:
- '{dataset}/{sub_dataset}/{exp}/{modeling_realm}/netCDF'
input_file:
default: '{sub_dataset}.{special_attr}-*.nc'
output_file: '{project}_{dataset}_{mip}_{exp}_{institute}_{sub_dataset}_{special_attr}_{short_name}'
cmor_type: 'CMIP6'
cmor_default_table_prefix: 'CMIP6_' This configuration results in ESMValCore analyzing all files and variables, which consumes excessive time and resources. I have suggested the following to @rhaegar325:
Given that all variables are stored in a single file, we are aware that this setup is not optimal. However, we currently have no alternative. Any advice would be greatly welcome! Thanks, |
Selecting the files within the specified timerange should already work, as it does for CMIP6 etc. Did you check in the ESMValCore/esmvalcore/local.py Lines 66 to 68 in 546937f
You could probably implement this in the
esmvalore.preprocessor.load so it skips the actual load step if the input is already a cube. Similar to what I tried out in #2454. In the longer term, we would like to implement a more flexible loading mechanism (see #2371), but we will first need to find funding for that.
|
Thanks @bouweandela, that is really useful. We are going to look into this. |
time gating is one side of the problem, as Bouwe points out, another is variable selection which we don't do it anymore at load point (we used to have an iris Constraint at load raw point, though), what you can do about it though, you can overload it with a constraint, see load_raw and its usage - if this is a bit too much of a hassle, you can perform the single-variable loading via a fix, so that it runs ahead of everything else, a rather agricultural solution, but a fairly hassle-free one in me books 🍺 |
Hi, Develop team,
in the last few month we developed a cmoriser for ACCESS-ESM raw data in ESMValCore. However, due to the different way to store the data(typically cmored data was store by single variable in all time-range in a file, ACCESS-ESM data was stored by one timestamp with all variables in one file), if we still use the default way in esmvalcore to load ACCESS-ESM data, that will cause a huge time and memory cost. so I was wondering if we could build a load method for ACCESS-esm raw data that will be super helpful, won't need to be conplex, just a filter to select file within time-range which specified in recipe would be good.
I open this issue to see if anyone have good idea about how to do that. I am willing to implement myself, just need to know which way was the best that both of us will accept.
The text was updated successfully, but these errors were encountered: