Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datetime units #782

Merged
merged 16 commits into from
Sep 18, 2024
Merged

Datetime units #782

merged 16 commits into from
Sep 18, 2024

Conversation

genematx
Copy link
Contributor

@genematx genematx commented Sep 5, 2024

This expands BuiltinDtype by adding explicit units for numpy datetime64 and timedelta64 dtypes, e.g. datetime64[ns]. This is required, for unambiguous representation of 8-byte datetime objects, specifically with zarr.

The BuiltinDtype class now defines an additional attribute, units, which is "" by default. The units are extracted from (and added to) the string representation of numpy.dtype, e.g. <M8[ns].

More information of supported units can be found here.

Checklist

  • Add a Changelog entry
  • Add the ticket number which this PR closes to the comment section

@danielballan
Copy link
Member

@joshmoore @MSanKeys963 If you have a moment I wonder if you could comment on plans for datetime data types in Zarr v3. At SciPy 2023 I remarked to Josh that they appeared to be missing from the spec and I think he suggested that they might be reintroduced. It does seem to be common in xarray examples, particularly climate examples.

The term datetime appears on that linked page once, but in the context of a "(currently made up) extension data type".

@joshmoore
Copy link

@danielballan: thanks for the ping. Unfortunately, I don't have any updates on datetime in Zarr v3. As you can imagine, the focus is still heavily on getting zarr-python v3 out the door, but I imagine there would be a lot of support for moving it forward. (I pointed to this PR and #774 at a recent dev meeting.)

# e.g. `'<M8[ns]'` has `units = 'ns'`.
units = ""
if dtype.kind in ("m", "M"):
if res := re.search(r"\[(.+)\]$", dtype.str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely there is a better way than running a regex on the string version of the dtype ?!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried every attribute of dtype, but couldn't find it. I really don't like the regex solution, but this is the best I could come up with.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can not dig this out, I think it would be better to go with units = {dtype_obj: 'string', ..}[dtype] where we directly map the numpy dtype to the string we want for it. A bit of boilerplate for us, but it is clear what is going on and protects us from most of the things I can think of that numpy could change that would mess with us (and it does not user their str repr as a critcal API).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would mean we could also pick some units not to support https://numpy.org/doc/stable/reference/arrays.datetime.html#datetime-units

Copy link
Contributor Author

@genematx genematx Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out, there is a helper utility function in numpy to extract units from datetime dtypes, and the units can be even more complicated. Thank you, @danielballan for pointing this out! This is probably the cleanest solution, and I hope numpy wouldn't change this api. I don't think we need to keep the units (strings) and counts (int) separately, though -- a compound attribute is enough -- but keen to hear your thoughts on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is something I would expect them to keep stable.

@danielballan danielballan merged commit 337ebab into bluesky:main Sep 18, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants