Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dimensionless Units #55
base: main
Are you sure you want to change the base?
Dimensionless Units #55
Changes from 18 commits
668c052
c05ba93
b1ef7f2
7900103
9da4cf8
367a1fc
7d9d910
0d7f331
9b0b05e
1c5b69b
e88b52a
cc4f159
78f82a6
c6c8a7f
6fbc69f
aa51c32
aaa1f19
0ba5fac
6bb99ff
d066700
64d94ae
4eb1478
36fcefd
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a check. You just call the actual conversion function. That's supposed to happen in
process
. I think you instead should see if the variable has supplied dimensionless units, and if those are defined in a way that pint will be able to handle it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay. will rewrite it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: how to know the unit in the netcdf file without accessing it. I can check the
cmor_variable
(table unit) alone from the rule object if it dimensionless and if the unit mapping is found in dimensionless_units but how about the source units. Ifmodel_variable
is defined then I can check it as well but that means the user wants to usemodel_variable
instead of the units found in the netcdf file.consider the following table
If I just check the
tableunit
, then I can not catchsidmasstranx
failing and forsitimefrac
the check will fail as it is dimensionless and there is no entry in the mapping for it yet but if the user puts an mapping entry for this variable then it still fails as source units (from netcdf file or user defined modelunits) are still dimensionless and the unit conversion function does not consider using mapping for source units.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The re-written code only deals with dimensionless variables (i.e., ignoring
sidmasstranx
case) and addresses the rest of the cases.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talking to @pgierz we decided to not support
psu
in the end because in models different than FESOM there might be different species of salt and the g/kg equivalence will only be approximate. In that case, the users will need to create a rule themselves to integrate their species of salt and provide and adequate conversion.Apologies for changing the mind about it @siligam, I have not considered these details when I suggested to support psu.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About the function
handle_unit_conversion(da: xr.DataArray, rule: Rule)
would it make sense to have a 3rd argumentdryrun=False
so that this function can be used before computation in a loop over the rules, to report wrong units in the model side, missing mappings for the tables, etc?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do this as long as it has a default argument of
dryrun=False
, otherwise we would break the pipeline API.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, I would set the parameter to be always
False
, unless specified otherwise. I suggesteddryrun
as name but if you rather have something likecheck
that's also a good name for me.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think instead of the option
dryrun=False
which breaks the API, the same thing can be achieved through a command line option with a meaningful name like "check" or "validate". At the moment, the check for frequency of source data and the frequency that the data needs to be converted to (according to frequency mention in table) is implemented inCMORizer._post_init_checks
. The unit conversion check can also be added there. The command line option can skip dask cluster creating thing, construct the CMORizer object by populating the rules so that_post_init_checks
has all the information it needs to verify and validate the user inputs for errors.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the following break the pipeline API? Why?
Isn't
_post_init_checks
run before the dask cluster initialization?I agree with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment, any step in a pipeline must have the signature:
I think it would be cleanest to turn
_post_init_checks
into a main CMORizer object method (e.g.validate
), which you can run independently ofprocess
. In my view, constructing the CMORizer object should still be possible, even if you fill it will rubbish rules (this makes testing easier). So, something like this:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @pgierz, for clarifying that.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to_unit
might not exists in thedimensionless_unit_mappings
because we might have made a mistake while writing the yaml. Can you add an error handling here? Something like:I guess the pymorize_cfg get dimensionless_mapping_table won't work since it it out of scope, but it could be included in the rule, so that we can report in the error exactly in which file the unit is missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. will write the error handling thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhere here, there should be a dimensionless, or "weird-unit" checker, so that it raises and error if a variable won't be able to be transformed by
pint
and suggesting the user/developer to include a mapping of units in the dimensionless_unit_mappingsThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something with the conversion still isn't working correctly, this new unit test fails and is an order 1000 too large (which is exactly the quantity of g per kg).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could be because of the unit defined in cf_xarray https://github.com/xarray-contrib/cf-xarray/blob/main/cf_xarray/units.py#L111
Let me try to redefine it to
g/kg
and see if that test passThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, if
1 g/kg
is approx. is1 psu
thenpsu
must be defined as"practical_salinity_unit = g/kg = psu = PSU"
instead of"practical_salinity_unit = [] = psu = PSU"
(as defined in cf_xarray) and then it works.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider the following code snippet
As long as the salinity is expressed in
gram/kilogram
ormilligram/gram
the magnitude is1
. The magnitude is0.001
when it is expressed inkg/kg
org/g
which means in dimensionless_units_mapping0.001
should be eitherkg/kg
org/g
.This means, in model output data is expressed in
psu (g/kg)
but in CMOR tables, the data is expected to be expressed askg/kg
org/g
.1 psu => 1 g/kg => 1 g/(1000 g) => 0.001 g/g => 0.001 dimensionless
1 psu => 1 g/kg => 1 (0.001 kg/kg) => 0.001 kg/kg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@christian-stepanek, @chrisdane, could you please explain here? I'm 100% confident that we should not be changing the magnitude of the data in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, g/kg is in this case the same as the CMOR unit, just called by a different name. Unit conversions may involve factors or adding offsets, but not in this case.