Extract `cubewrite` fill value modifications #114

blimlim · 2024-09-27T00:49:47Z

This pr closes #99.

It extract's the cubewrite fill value modifications into a seperate function, adds a custom_fill_val optional argument, and adds unit tests.

Any suggestions or ideas are welcome!

blimlim · 2024-09-27T00:51:57Z

There's a slight amount of messiness with the added custom_fill_val argument, and so any ideas or suggestions would be great! Specifically, the final line of fix_fill_value forces the fill value to have the same type as the cube.data, which is required by the netCDF conventions

To prevent e.g. a float custom_fill_val from silently being rounded to an integer, I thought it would be good to raise an error when the types don't match:

um2nc-standalone/umpost/um2netcdf.py

Lines 334 to 340 in f3fbd44

    
           if custom_fill_val is not None: 
        
               if type(custom_fill_val) == cube.data.dtype: 
        
                   fill_value = custom_fill_val 
        
               else: 
        
                   msg = (f"custom_fill_val type {type(custom_fill_val)} does not " 
        
                          f"match cube {cube.name()} data type {cube.data.dtype}.") 
        
                   raise TypeError(msg)

This will also complain e.g. if a float was going to be rounded down to a float32, however this is stricter than what's applied to the default fill value for floats, which is would get rounded down to a float32 if the the cube's data is float32:

um2nc-standalone/umpost/um2netcdf.py

Lines 342 to 350 in f3fbd44

    
           elif cube.data.dtype.kind == 'f': 
        
               fill_value = DEFAULT_FILL_VAL_FLOAT 
        
           else: 
        
               fill_value = default_fillvals[ 
        
                   f"{cube.data.dtype.kind:s}{cube.data.dtype.itemsize:1d}" 
        
               ] 
        
           # Use an array to force the type to match the data type 
        
           cube.attributes['missing_value'] = np.array([fill_value], cube.data.dtype)

I don't think it's too important, but mainly wanted to check whether we're happy for the type requirements for the custom_fill_val to be stricter than what's applied to the default values?

blimlim · 2024-09-27T01:04:07Z

Ah oops, didn't notice the additional places the fill_value is required. Will fix this now

to separate the fill value modifications into two steps

blimlim · 2024-10-03T05:20:24Z

umpost/um2netcdf.py

+    else:
+        fill_value = get_default_fill_value(cube)
+
+    if type(fill_value) == cube.data.dtype:


My linter (flake8) complains here, suggesting do not compare types, for exact checks use "is" / "is not", for instance checks use "isinstance()"

I've had a play around, however neither is or isinstance work as desired when using cube.data.dtype in the comparison. Would be keen to gather any suggestions!

... neither is or isinstance work as desired when using cube.data.dtype in the comparison. Would be keen to gather any suggestions!

is will fail if the instances are not the same object.

The second aspect is numpy using internal types in place of Python's built in primitives:

>>> type(cube.data.dtype) numpy.dtypes.Float64DType >>> type(1.0) <class 'float'>

That affects comparison operations (more on that in another comment).

blimlim · 2024-10-03T06:26:42Z

Ok, I've fixed my problem where I omitted the return value, and tried to clean up the structure. Rather than combining both of the following in a single function:

getting the default fill value for a cube data's type
Setting the cube's missing_data attribute to the fill value

I separated the first part into a get_default_fill_value() function, which is then called by the overall fix_fill_value() function that completes the other steps. I think it's ready for review, and any suggestions or ideas are welcome!

There are a couple of things to note:

The behaviour on the surface is not exactly the same as prior to this PR.

In this PR, we apply the type conversion to fill_value before it is supplied to either the missing_value or _FillValue attributes.

um2nc-standalone/umpost/um2netcdf.py

Line 354 in e0048d4

return np.array([fill_value], dtype=cube.data.dtype)[0]

Previously, the missing_value attribute was converted to the cube's data type, but the _FillValue was not. However _FillValue appears to be converted externally, and so I don't believe the additional conversion added in this PR has any impacts. I've added a comment to try and clarify this

um2nc-standalone/umpost/um2netcdf.py

Lines 344 to 351 in e3b98f3

    
           # NB: the `_FillValue` attribute appears to be converted to match the 
        
           # cube data's type externally (likely in the netCDF4 library). It's not 
        
           # strictly necessary to do the conversion here. However, we need 
        
           # the separate `missing_value` attribute to have the correct type 
        
           # and so it's cleaner to set the type for both here. 
        
           # TODO: Is there a cleaner way do do the following conversion? 
        
           return np.array([fill_value], dtype=cube.data.dtype)[0]

Converting an ESM1.5 monthly output file results in identical missing_value and _FillValue types before and after this PR.

blimlim added 3 commits September 25, 2024 12:39

Extract cubewrite fill value modifications into separate function

72c183f

Add custom fill value type check

239a69e

More explicit custom_fill_val check to allow 0 values

f3fbd44

blimlim requested a review from truth-quark September 27, 2024 00:52

blimlim marked this pull request as draft September 27, 2024 01:03

blimlim added 6 commits September 27, 2024 12:25

Split fill value modifications into two steps

0a57c66

Add return value to fix_fill_value

f800a43

clean up

7a812be

Merge branch '99/fill-values-split-functions-test' into 99/fill-values

a538df0

to separate the fill value modifications into two steps

Restructure and add custom_fill_value argument

9fd827b

Typos

014bd37

blimlim commented Oct 3, 2024

View reviewed changes

blimlim added 2 commits October 3, 2024 16:20

Clarifying comment on type conversion

e0048d4

clean up comments

e3b98f3

blimlim marked this pull request as ready for review October 3, 2024 06:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract `cubewrite` fill value modifications #114

Extract `cubewrite` fill value modifications #114

blimlim commented Sep 27, 2024

blimlim commented Sep 27, 2024

blimlim commented Sep 27, 2024

blimlim Oct 3, 2024

truth-quark Oct 3, 2024

blimlim commented Oct 3, 2024

Extract cubewrite fill value modifications #114

Are you sure you want to change the base?

Extract cubewrite fill value modifications #114

Conversation

blimlim commented Sep 27, 2024

blimlim commented Sep 27, 2024

blimlim commented Sep 27, 2024

blimlim Oct 3, 2024

Choose a reason for hiding this comment

truth-quark Oct 3, 2024

Choose a reason for hiding this comment

blimlim commented Oct 3, 2024

Extract `cubewrite` fill value modifications #114

Extract `cubewrite` fill value modifications #114