Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-45431: Update parquet formatter to use can_accept() #1059

Merged
merged 13 commits into from
Aug 19, 2024
Merged

Conversation

erykoff
Copy link
Contributor

@erykoff erykoff commented Aug 16, 2024

This PR additionally adds can_accept() support to delegates and the InMemoryDatastore.

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes
  • (if changing dimensions.yaml) make a copy of dimensions.yaml in configs/old_dimensions

@erykoff erykoff marked this pull request as draft August 16, 2024 21:27
@erykoff erykoff marked this pull request as ready for review August 17, 2024 22:27
@erykoff erykoff requested a review from timj August 17, 2024 22:27
Copy link

codecov bot commented Aug 17, 2024

Codecov Report

Attention: Patch coverage is 96.16858% with 10 lines in your changes missing coverage. Please review.

Project coverage is 89.55%. Comparing base (41ffdbb) to head (54a7d55).
Report is 14 commits behind head on main.

Files Patch % Lines
python/lsst/daf/butler/delegates/arrowtable.py 90.29% 3 Missing and 7 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1059      +/-   ##
==========================================
+ Coverage   89.53%   89.55%   +0.01%     
==========================================
  Files         361      357       -4     
  Lines       46200    46173      -27     
  Branches     9469     9486      +17     
==========================================
- Hits        41366    41348      -18     
+ Misses       3499     3488      -11     
- Partials     1335     1337       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@timj timj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. I assume the tests demonstrate that the coercion is bypassed correctly and accuracy is retained?

if not delegate or not delegate.can_accept(inMemoryDataset):
inMemoryDataset = ref.datasetType.storageClass.coerce_type(inMemoryDataset)

if not self.constraints.isAcceptable(ref):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be a bit more efficient to do this check before the coerce step.


import pyarrow as pa
from lsst.daf.butler import StorageClassDelegate
from lsst.utils.introspection import get_full_type_name
from lsst.utils.iteration import ensure_iterable

__all__ = ["ArrowTableDelegate"]
if TYPE_CHECKING:
import pandas
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We disable warnings from pandas bad imports in the mypy config so I think this will not be used in the mypy github action check because pandas isn't listed in types.txt.

python/lsst/daf/butler/formatters/parquet.py Show resolved Hide resolved
@@ -2212,5 +2162,81 @@ def testRowGroupSizeDataFrameWithLists(self):
self.assertGreater(row_group_size, 1_000_000)


def _checkAstropyTableEquality(table1, table2, skip_units=False, has_bigendian=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I wonder if this should replace assertAstropyTablesEqual in lsst.daf.butler.tests.utils.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't noticed this! Is it worth looking into merging these now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure your checks are more complete and so we may as well sync up the utility code. Not required now though.

@erykoff erykoff merged commit d09db1e into main Aug 19, 2024
18 checks passed
@erykoff erykoff deleted the tickets/DM-45431 branch August 19, 2024 02:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants