-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-41941: Handle all options for RemoteButler.get() inputs #923
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #923 +/- ##
==========================================
+ Coverage 87.60% 87.63% +0.03%
==========================================
Files 295 295
Lines 38254 38327 +73
Branches 8088 8097 +9
==========================================
+ Hits 33511 33588 +77
+ Misses 3539 3535 -4
Partials 1204 1204 ☔ View full report in Codecov by Sentry. |
ee6a532
to
61b7dee
Compare
96f5018
to
c54ad78
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Nice cleanups.
@@ -69,17 +76,20 @@ | |||
DataCoordinate or other data ID. | |||
""" | |||
|
|||
SerializedDataId = dict[str, DataIdValue] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oddly, DataId
is defined below to use Any
instead of DataIdValue
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean in this bit?:
DataId = DataCoordinate | Mapping[str, Any]
Not really sure what that's about. DataCoordinate proper does seem to use the DataIdValue pretty consistently everywhere. That DataId
union exists mainly as a parameter for Butler methods I think, so maybe when you guys added mypy there was too much random code that was passing in like stringifiable things that weren't actually strings or something. There is a bit of code in standardize
doing things like coercing numpy.int64 to int, but not as much as I would have thought.
I kind of feel like there are a values that aren't ints or strings though. Isn't some stuff like floats or datetimes?
I sort of suspect that the existing QuantumGraph code or wherever this is getting used just happens to not use anything unusual, and we may end up with some stuff getting accidentally coerced to strings at some point. In any case I just pulled out an alias for the existing definition from SerializedDataCoordinate, so a problem for another day.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dimensions do not use floats because you can't do equality on them. Dimensions are strings or int (and as you found, integral types so really should be numbers.Integral
or something instead of int).
def direct( | ||
cls, *, dataId: dict[str, DataIdValue], records: dict[str, dict] | None | ||
) -> SerializedDataCoordinate: | ||
def direct(cls, *, dataId: SerializedDataId, records: dict[str, dict] | None) -> SerializedDataCoordinate: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe change the type of the dataId parameter in the docstring below (although last I looked Sphinx can get upset with type aliases).
def _simplify_dataId( | ||
self, dataId: DataId | None, **kwargs: dict[str, int | str] | ||
) -> SerializedDataCoordinate | None: | ||
def _simplify_dataId(self, dataId: DataId | None, kwargs: dict[str, int | str]) -> SerializedDataId: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def _simplify_dataId(self, dataId: DataId | None, kwargs: dict[str, int | str]) -> SerializedDataId: | |
def _simplify_dataId(self, dataId: DataId | None, kwargs: dict[str, DataIdValue]) -> SerializedDataId: |
?
|
||
Returns | ||
------- | ||
data_id : `SerializedDataCoordinate` or `None` | ||
data_id : `SerializedDataId` or `None` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data_id : `SerializedDataId` or `None` | |
data_id : `SerializedDataId` |
dataset_id = datasetRefOrType.id | ||
response = self._get(f"get_file/{dataset_id}") | ||
if response.status_code == 404: | ||
raise LookupError(f"Dataset not found with ID {dataset_id}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this was in the old code but it might be better to print the datasetRefOrType
here and not just the ID since that is more human readable (and will include the ID).
ref = DatasetRef.from_simple( | ||
self._parse_model(response, SerializedDatasetRef), universe=self.dimensions | ||
) | ||
if storage_class is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this.
"""Convert DatasetType parameters in the format used by Butler methods | ||
to a standardized string name for the REST API. | ||
""" | ||
if isinstance(datasetTypeOrName, DatasetType): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would
return getattr(datasetTypeOrName, "name", datasetTypeOrName)
be more efficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slightly, yeah, but less type safe and more ambiguous. I personally have no idea off the top of my head whether str
or any str-like objects that may be passed in by a user have a name
attribute or not.
) | ||
return _get_file_by_ref(butler, ref) | ||
except LookupError as e: | ||
raise NotFoundException() from e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No error message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not actually shipping the error message from those NotFoundExceptions to the client yet, and the from e
will put the original error message in the server logs. I suppose I could have put in "DataId not found" or something.
In a couple weeks when I go in and set up the error propagation framework, that try block will go away entirely.
tests/test_server.py
Outdated
|
||
def check_sc_override(converted): | ||
self.assertNotEqual(type(metric), type(converted)) | ||
self.assertIs(type(converted), new_sc.pytype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.assertIs(type(converted), new_sc.pytype) | |
self.assertIsInstance(converted, new_sc.pytype) |
?
@@ -81,6 +84,8 @@ def get_dataset_as_python_object( | |||
] | |||
|
|||
ref = DatasetRef.from_simple(payload.dataset_ref, universe=universe) | |||
if component is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe note that you have to apply the component override before the storage class override.
Adjusted the handling of dataId and kwargs in RemoteButler.find_datasets to mitigate a few issues: 1. There was a security vulnerability where the caller could smuggle in an arbitrary unvalidated string, integer or 'None' value to any keyword argument on the server-side 'find_dataset' function 2. Augmentation of DataCoordinate values by kwargs was not working -- it only worked for the 'Mapping' case of DataId 3. It was sending and validating unused DimensionRecords from DataCoordinate. Removed them to avoid forwards compatibility issues 4. Fixed an issue where 'None' for DataId was not correctly being passed to the server -- it would instead throw a Pydantic error on the client side. None is equivalent to empty dict in find_datasets and get(), so just removed the nullability here.
Implement storage class overrides via DatasetRef and DatasetType.
The StorageClass requested by the client may not be known to the server. There's no need for the server to get involved in StorageClass conversion at all, so it also simplifies the REST API.
This doesn't mutate the state of the server and has only a single string parameter, so GET makes sense
c54ad78
to
166fc17
Compare
Checklist
doc/changes