Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract cubewrite fill value modifications #114

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

Conversation

blimlim
Copy link
Collaborator

@blimlim blimlim commented Sep 27, 2024

This pr closes #99.

It extract's the cubewrite fill value modifications into a seperate function, adds a custom_fill_val optional argument, and adds unit tests.

Any suggestions or ideas are welcome!

@blimlim
Copy link
Collaborator Author

blimlim commented Sep 27, 2024

There's a slight amount of messiness with the added custom_fill_val argument, and so any ideas or suggestions would be great! Specifically, the final line of fix_fill_value forces the fill value to have the same type as the cube.data, which is required by the netCDF conventions

To prevent e.g. a float custom_fill_val from silently being rounded to an integer, I thought it would be good to raise an error when the types don't match:

if custom_fill_val is not None:
if type(custom_fill_val) == cube.data.dtype:
fill_value = custom_fill_val
else:
msg = (f"custom_fill_val type {type(custom_fill_val)} does not "
f"match cube {cube.name()} data type {cube.data.dtype}.")
raise TypeError(msg)

This will also complain e.g. if a float was going to be rounded down to a float32, however this is stricter than what's applied to the default fill value for floats, which is would get rounded down to a float32 if the the cube's data is float32:

elif cube.data.dtype.kind == 'f':
fill_value = DEFAULT_FILL_VAL_FLOAT
else:
fill_value = default_fillvals[
f"{cube.data.dtype.kind:s}{cube.data.dtype.itemsize:1d}"
]
# Use an array to force the type to match the data type
cube.attributes['missing_value'] = np.array([fill_value], cube.data.dtype)

I don't think it's too important, but mainly wanted to check whether we're happy for the type requirements for the custom_fill_val to be stricter than what's applied to the default values?

@blimlim blimlim marked this pull request as draft September 27, 2024 01:03
@blimlim
Copy link
Collaborator Author

blimlim commented Sep 27, 2024

Ah oops, didn't notice the additional places the fill_value is required. Will fix this now

else:
fill_value = get_default_fill_value(cube)

if type(fill_value) == cube.data.dtype:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My linter (flake8) complains here, suggesting do not compare types, for exact checks use "is" / "is not", for instance checks use "isinstance()"

I've had a play around, however neither is or isinstance work as desired when using cube.data.dtype in the comparison. Would be keen to gather any suggestions!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... neither is or isinstance work as desired when using cube.data.dtype in the comparison. Would be keen to gather any suggestions!

is will fail if the instances are not the same object.

The second aspect is numpy using internal types in place of Python's built in primitives:

>>> type(cube.data.dtype)
numpy.dtypes.Float64DType
>>> type(1.0)
<class 'float'>

That affects comparison operations (more on that in another comment).

@blimlim
Copy link
Collaborator Author

blimlim commented Oct 3, 2024

Ok, I've fixed my problem where I omitted the return value, and tried to clean up the structure. Rather than combining both of the following in a single function:

  1. getting the default fill value for a cube data's type
  2. Setting the cube's missing_data attribute to the fill value

I separated the first part into a get_default_fill_value() function, which is then called by the overall fix_fill_value() function that completes the other steps. I think it's ready for review, and any suggestions or ideas are welcome!


There are a couple of things to note:

The behaviour on the surface is not exactly the same as prior to this PR.

In this PR, we apply the type conversion to fill_value before it is supplied to either the missing_value or _FillValue attributes.

return np.array([fill_value], dtype=cube.data.dtype)[0]

Previously, the missing_value attribute was converted to the cube's data type, but the _FillValue was not. However _FillValue appears to be converted externally, and so I don't believe the additional conversion added in this PR has any impacts. I've added a comment to try and clarify this

# NB: the `_FillValue` attribute appears to be converted to match the
# cube data's type externally (likely in the netCDF4 library). It's not
# strictly necessary to do the conversion here. However, we need
# the separate `missing_value` attribute to have the correct type
# and so it's cleaner to set the type for both here.
# TODO: Is there a cleaner way do do the following conversion?
return np.array([fill_value], dtype=cube.data.dtype)[0]

Converting an ESM1.5 monthly output file results in identical missing_value and _FillValue types before and after this PR.

@blimlim blimlim marked this pull request as ready for review October 3, 2024 06:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extract cubewrite() fill value to function
2 participants