Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamping time series interface to ease adding new pH analysis #39

Open
ledm opened this issue Aug 15, 2022 · 20 comments
Open

Revamping time series interface to ease adding new pH analysis #39

ledm opened this issue Aug 15, 2022 · 20 comments
Assignees

Comments

@ledm
Copy link
Collaborator

ledm commented Aug 15, 2022

Julien P. has requested that we add a few more fields to the analysis.

While this may seem like a simple task, I think it would be a good opportunity to re-vamp the analysis_timeseries and analysis_compare interfaces. I'm thinking of good ways to do this, and I'll add some ideas to this issue but it might be good to have a chat about it and come up with a plan.

Two sets of files:

  • replace list (physkeys, etc) with yaml lists of individual model keys
  • replace individual analysis with a yaml file that contains:
    • all the input details
    • info about coordinates
    • info about obs
    • etc

or a single long massive yaml file:

  • make a master analysis yaml with:
    • list of switches to turn on and off individual keys
    • list of keys
      • list of details for each key

or ... Something else?

The problem that I had before with this was that you can't include a function in yaml, only dicts. Will explain on the phone.

@ledm ledm changed the title Adding new pH analysis Revamping time series interface to ease adding new pH analysis Aug 15, 2022
@ledm
Copy link
Collaborator Author

ledm commented Aug 15, 2022

The fields that Julien wants are:

  • "RIV_ALK" : alk flux added,
  • "TOT_SHALK" : total alkalinity added (because we want to know how much is added in total)
  • "ATM_PCO2" : atmospheric pCO2
  • "CO2FLUX" : CO2 air-sea flux
  • "ATM_XCO2"
  • "OCN_PCO2"

I propose adding these into a new suite called Alkalinity or something

@ledm
Copy link
Collaborator Author

ledm commented Aug 15, 2022

In terms of the interface, it currently works like this.

This is how analysis_timeseries creates a list of analyses to run:

bgcKeys = []
bgcKeys.append('N')  # WOA Nitrate
...
if analysisSuite.lower() in [ 'bgc',]:
    analysisKeys.extend(bgcKeys)

where analysisSuite is provided in the command line.

The details on how to run that analysis is then provided in a long function:

    if 'N' in analysisKeys:
        name = 'Nitrate'
        av[name]['modelFiles'] = listModelDataFiles(jobID, 'ptrc_T',
                                                    paths.ModelFolder_pref,
                                                    annual)
        if annual:
            av[name]['dataFile'] = WOAFolder + '/woa13_all_n00_01.nc'
        else:
            av[name]['dataFile'] = WOAFolder + '/nitrate_monthly_1deg.nc'

        av[name]['modelcoords'] = medusaCoords
        av[name]['datacoords'] = woaCoords

        av[name]['modeldetails'] = {
            'name': name,
            'vars': [
                'DIN',
            ],
            'convert': ukp.NoChange,
            'units': 'mmol N/m^3'
        }
        av[name]['datadetails'] = {
            'name': name,
            'vars': [
                'n_an',
            ],
            'convert': ukp.NoChange,
            'units': 'mmol N/m^3'
        }

        av[name]['layers'] = layerList
        av[name]['regions'] = regionList
        av[name]['metrics'] = metricList  #['mean','median', ]

        av[name]['datasource'] = 'WOA'
        av[name]['model'] = 'MEDUSA'

        av[name]['modelgrid'] = 'eORCA1'
        av[name]['gridFile'] = paths.orcaGridfn
        av[name]['Dimensions'] = 3

The autovivification (av) is not actually passed to timeseriesAnalysis. Instead, it is unpacked:

        tsa = timeseriesAnalysis(
            av[name]['modelFiles'],
            av[name]['dataFile'],
            dataType=name,
            modelcoords=av[name]['modelcoords'],
            modeldetails=av[name]['modeldetails'],
            datacoords=av[name]['datacoords'],
            datadetails=av[name]['datadetails'],
            datasource=av[name]['datasource'],
            model=av[name]['model'],
            jobID=jobID,
            layers=av[name]['layers'],
            regions=av[name]['regions'],
            metrics=av[name]['metrics'],
            workingDir=shelvedir,
            imageDir=imagedir,
            grid=av[name]['modelgrid'],
            gridFile=av[name]['gridFile'],
            clean=clean,
        )

There has got to be a better way to do this.

@ledm
Copy link
Collaborator Author

ledm commented Aug 15, 2022

There are two parts to this:

  1. Write a new interface that moves analysis details out of analysis_timeseries.py
  2. Migrate current contents of analysis_timeseries.py into new interface.

This is actually a kind of massive job, but I think getting it right would make the new user experience a lot better.

@valeriupredoi

This comment was marked as outdated.

@ledm

This comment was marked as outdated.

@valeriupredoi

This comment was marked as outdated.

@ledm

This comment was marked as outdated.

@valeriupredoi

This comment was marked as outdated.

@ledm
Copy link
Collaborator Author

ledm commented Aug 15, 2022

I'm going ahead with working on a keys interface right now. As I think this is the way forward. I don't think that putting this into a single massive config file is the right way.

@ledm

This comment was marked as outdated.

@valeriupredoi
Copy link
Owner

sounds good - the idea of a config file is to have anything that a user would tune on a regular basis, if there are a lot of config values that stay the same over long periods, we can either let them sit in the code or we can do a config-developer that is not touched by the users many times

@valeriupredoi
Copy link
Owner

how did a Py2->Py3 conversion expand the configuration rofl

It blew up the total number of lines:

python2:

        av[name]['modeldetails'] = {'name': name, 'vars': ['AEOLIAN', ],'convert': modeldustsum, 'units': 'Gmol Fe/yr'}

python3

        av[name]['modeldetails'] = {
            'name': name,
            'vars': [
                'AEOLIAN',
            ],
            'convert': modeldustsum,
            'units': 'Gmol Fe/yr'
        }

OK but that didn't change the configuration, it just put things neatly in a user-readable format 😁

@valeriupredoi
Copy link
Owner

also, if you want to move a lot of text to other files, you can think of CMOR-like files where we store all those variables

@ledm
Copy link
Collaborator Author

ledm commented Aug 15, 2022

My plan is the following code structure:

  • keys_yml - new directory

    • bgc.yml: a file with the bgc keys
    • physics.yml: a file with a list of physics keys
    • etc
  • variables_yml: new directory with lots of individual analysis files:

    • temperature.yml: single yml file with details for temperatrure analysis
    • Ice: single yml file with some ice analysis.
    • etc...

So yeah, kinda like cmor in esmvaltool.

(This is why I said it would be good to have a chat - heading to lunch now)

@ledm
Copy link
Collaborator Author

ledm commented Aug 15, 2022

@valeriupredoi
Copy link
Owner

I like it! I'd just not call dirs with _yml suffix 😁

@valeriupredoi
Copy link
Owner

open (Draft) PR too, man, so we can debug on the way

@ledm
Copy link
Collaborator Author

ledm commented Aug 16, 2022

Hey, so the AV stuff, I already did that, but in the GMD version of BGC-val:
https://github.com/ledm/bgc-val-public/blob/main/ini/ukesm_jasmin.ini

This writes them as a single long list, but I think individual files is the way forward instead, I'm wondering if there's some easy way of bringing those changes over.

The code that reads these files is:
https://github.com/ledm/bgc-val-public/blob/78798c774dc2e8c99107c2634c7327227c32d858/bgcvaltools/configparser.py

@ledm
Copy link
Collaborator Author

ledm commented Aug 16, 2022

Bringing in the code from that bgc-val-public:

  • Split ukesm_jasmin.ini into multiple single files and convert them to yaml format. Format change is:
    • Add --- at the start and brief description
    • Remove [name]
    • add bgcval2 before functions - new path.
  • Bring over functions directory and convert it to python3 and bgcval2 format.
    • @valeriupredoi - I'm currently struggling with 2to3 as a conversion for this code. It doesn't seem to work.
  • Set up tools, perhaps using bgc-val-public's config parser to load yaml file.

@ledm
Copy link
Collaborator Author

ledm commented Aug 16, 2022

Okay, I've got some progress here for a debug mode. It's running! I'll set up a draft PR next.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants