DM-26658: Re-implement the Formatter class #1018

timj · 2024-05-17T23:50:42Z

Use entirely new class: FormatterV2
Provide FormatterV1inV2 shim class.
Since FormatterV2 needs access to the cache, moved all the cache code to FormatterV2 itself out of Datastore.
Three read methods can be provided, two write methods.
Can now read content from within zip archives.

For read:

read_from_uri gives you the full URI to do with whatever you want but might give you a locally file if it was in the cache.
read_from_stream gives you a file handle for the resource.
read_from_local_file gives you a guaranteed local file.

Flags declare which of those 3 are available for a given formatter to allow the FormatterV2.read method to pick a suitable option based on what it knows (or if a zip file is in play, in which case read_from_uri can't be called).

For write:

to_bytes is called first (can be Not Implemented)
Falls back to write_local_file

The default implementation of write_local_file calls to_bytes.

All daf_butler formatters ported to V2. There are some V1 test formatters since otherwise nothing will test FormatterV1inV2.

This mostly seems to work but there are some things I still need to do:

There is some code duplication within FormatterV2 that I should tidy up.
I agreed to add any notes from sub exceptions when I re-raise so that notes are more visible in the final stack trace.
Work out how to prevent storage class conversion before handing off to a formatter.

Checklist

ran Jenkins
added a release note for user-visible changes to doc/changes
(if changing dimensions.yaml) make a copy of dimensions.yaml in configs/old_dimensions

codecov · 2024-05-18T05:08:40Z

Codecov Report

Attention: Patch coverage is 85.87571% with 100 lines in your changes missing coverage. Please review.

Project coverage is 89.37%. Comparing base (cdbbd14) to head (c4dd9bc).

Files	Patch %	Lines
python/lsst/daf/butler/_formatter.py	81.58%	40 Missing and 25 partials ⚠️
python/lsst/daf/butler/tests/_datasetsHelper.py	73.68%	2 Missing and 3 partials ⚠️
python/lsst/daf/butler/formatters/typeless.py	90.24%	3 Missing and 1 partial ⚠️
python/lsst/daf/butler/datastores/fileDatastore.py	86.36%	1 Missing and 2 partials ⚠️
python/lsst/daf/butler/formatters/file.py	0.00%	3 Missing ⚠️
python/lsst/daf/butler/formatters/parquet.py	88.46%	0 Missing and 3 partials ⚠️
python/lsst/daf/butler/formatters/yaml.py	85.71%	2 Missing and 1 partial ⚠️
python/lsst/daf/butler/_file_descriptor.py	50.00%	1 Missing and 1 partial ⚠️
python/lsst/daf/butler/_storage_class.py	75.00%	1 Missing and 1 partial ⚠️
...n/lsst/daf/butler/datastores/file_datastore/get.py	91.66%	1 Missing and 1 partial ⚠️
... and 5 more

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1018      +/-   ##
==========================================
- Coverage   89.55%   89.37%   -0.18%     
==========================================
  Files         358      359       +1     
  Lines       45277    45622     +345     
  Branches     9276     9348      +72     
==========================================
+ Hits        40548    40775     +227     
- Misses       3433     3521      +88     
- Partials     1296     1326      +30

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.