-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-26658: Re-implement the Formatter class #1018
Conversation
0f4c8c7
to
d6e5ce6
Compare
For the record, we have been asked to support reading a file from within a zip file (where a single zip file is registered as multiple datasets). This might affect some of our thinking here as to the best interface. |
8017aba
to
e271398
Compare
The |
3bd3e3a
to
005bc0a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. I've left a lot of comments because there's a lot here, but overall I'm very happy with all of this.
I think my biggest concern is that I'd like to make it really clear what all the storage classes are in the various places they appear in a formatter, for the various scenarios in which a formatter is used, and I could imagine trying to document/test that more rigorously leading to some changes to make it more consistent. There's a line comment thread on this we can use for further discussion.
cache_manager = DatastoreDisabledCacheManager(None, None) | ||
return cache_manager | ||
|
||
def read( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might want to use @typing.final
to enforce what the docs say.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea. Except TypelessFormatter
only works because it overrides read
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that reinforces my thinking that we should move away from having a JSON formatter and a single YAML formatter that try to do everything, and specialize for Pydantic/dataclasses,etc. But not on this ticket.
This class provides a ``read()`` method that will run `FormatterV2.read` | ||
and coerce the return type using a variety of techniques. Use the | ||
standard `FormatterV2` methods for reading bytes/files and writing | ||
bytes/files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gets me thinking that we should have different JSON/YAML formatters for different categories of target types (one for things that are natively pydantic models, another for those that have to_simple
and from_simple
, etc.).
May be too late for PropertySet
vs. TaskMetadata
, of course, but if we'd had different formatters earlier that transition would have been easier to handle, and I'd like to do better in future similar transitions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could have a PydanticFormatter implementation explicitly. This wouldn't have solved the PropertySet vs TaskMetadata problem though because those are entirely different types and the fix there would have been to have a specialist class from the beginning for task metadata that might have internally used a PropertySet rather than using a generic container class at the start.
205761f
to
27c5986
Compare
Entirely new FormatterV2 with a wrapper class to support legacy Formatter implementations. No longer subclass the read and write method but instead subclass * read_from_uri * read_from_stream * read_from_local_file * to_bytes * write_local_file
The tests still use V1 though
Else tests do not pass.
This is more akin to how dunder methods work in general and is a reasonable approach when we have three options and want to try each.
This allows the type ignore to be removed.
ci_middleware has mocks that look like they are all compatible with the requested python type when they aren't so by default a formatter can't naively compare types and decide that since they match no conversion is required. Always return False for Formatter.can_accept and force type coercion (which works with the mocks because of subclassing). Formatter can't know about mocks (which are in pipe_base) and returning False doesn't change anything in the default case because conversion will be a no-op. The mocks also track file conversions so if conversions are bypassed the tests confirming conversions won't pass.
This is now all handled internally in the formatter.
For read:
read_from_uri
gives you the full URI to do with whatever you want but might give you a locally file if it was in the cache.read_from_stream
gives you a file handle for the resource.read_from_local_file
gives you a guaranteed local file.Flags declare which of those 3 are available for a given formatter to allow the
FormatterV2.read
method to pick a suitable option based on what it knows (or if a zip file is in play, in which caseread_from_uri
can't be called).For write:
to_bytes
is called first (can be Not Implemented)write_local_file
The default implementation of
write_local_file
callsto_bytes
.All daf_butler formatters ported to V2. There are some V1 test formatters since otherwise nothing will test
FormatterV1inV2
.This mostly seems to work but there are some things I still need to do:
Checklist
doc/changes
configs/old_dimensions