Skip to content

Commit

Permalink
Disallow re.Pattern in dataset type queries
Browse files Browse the repository at this point in the history
RFC-879.
  • Loading branch information
timj committed Aug 8, 2024
1 parent 7941936 commit 47941f4
Show file tree
Hide file tree
Showing 5 changed files with 22 additions and 12 deletions.
3 changes: 1 addition & 2 deletions doc/lsst.daf.butler/queries.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,11 +26,10 @@ Arguments that specify one or more dataset types can generally take any of the f
- `DatasetType` instances;
- `str` values (corresponding to `DatasetType.name`);
- `str` values using glob wildcard syntax which will be converted to `re.Pattern`;
- `re.Pattern` values (matched to `DatasetType.name` strings, via `~re.Pattern.fullmatch`);
- iterables of any of the above;
- the special value "``...``", which matches all dataset types.

Wildcards (`re.Pattern` and ``...``) are not allowed in certain contexts, such as `Registry.queryDataIds` and `Registry.queryDimensionRecords`, particularly when datasets are used only as a constraint on what is returned.
Wildcards (globs and ``...``) are not allowed in certain contexts, such as `Registry.queryDataIds` and `Registry.queryDimensionRecords`, particularly when datasets are used only as a constraint on what is returned.
`Registry.queryDatasetTypes` can be used to resolve patterns before calling these methods when desired.
In these contexts, passing a dataset type or name that is not registered with the repository will result in `MissingDatasetTypeError` being raised, while contexts that do accept wildcards will typically ignore unregistered dataset types (for example, `Registry.queryDatasets` will return no datasets for these).

Expand Down
6 changes: 2 additions & 4 deletions python/lsst/daf/butler/registry/_registry.py
Original file line number Diff line number Diff line change
Expand Up @@ -1241,10 +1241,8 @@ def queryDataIds(
``exposure``, ``detector``, and ``physical_filter`` values to only
those for which at least one "raw" dataset exists in
``collections``. Allowed types include `DatasetType`, `str`,
and iterables thereof. Regular expression objects (i.e.
`re.Pattern`) are deprecated and will be removed after the v26
release. See :ref:`daf_butler_dataset_type_expressions` for more
information.
and iterables thereof. See
:ref:`daf_butler_dataset_type_expressions` for more information.
collections : collection expression, optional
An expression that identifies the collections to search for
datasets, such as a `str` (for full matches or partial matches
Expand Down
6 changes: 2 additions & 4 deletions python/lsst/daf/butler/registry/sql_registry.py
Original file line number Diff line number Diff line change
Expand Up @@ -2148,10 +2148,8 @@ def queryDataIds(
``exposure``, ``detector``, and ``physical_filter`` values to only
those for which at least one "raw" dataset exists in
``collections``. Allowed types include `DatasetType`, `str`,
and iterables thereof. Regular expression objects (i.e.
`re.Pattern`) are deprecated and will be removed after the v26
release. See :ref:`daf_butler_dataset_type_expressions` for more
information.
and iterables thereof. See
:ref:`daf_butler_dataset_type_expressions` for more information.
collections : collection expression, optional
An expression that identifies the collections to search for
datasets, such as a `str` (for full matches or partial matches
Expand Down
10 changes: 8 additions & 2 deletions python/lsst/daf/butler/registry/wildcards.py
Original file line number Diff line number Diff line change
Expand Up @@ -434,7 +434,6 @@ def from_expression(cls, expression: Any) -> DatasetTypeWildcard:
- a `str` dataset type name;
- a `DatasetType` instance;
- a `re.Pattern` to match against dataset type names;
- an iterable whose elements may be any of the above (any dataset
type matching any element in the list is an overall match);
- an existing `DatasetTypeWildcard` instance;
Expand All @@ -455,9 +454,16 @@ def from_expression(cls, expression: Any) -> DatasetTypeWildcard:
"""
if isinstance(expression, cls):
return expression
# CategorizedWildcard currently allows globs and regex as patterns
# but RFC-879 drops support for regex in dataset type specifications.
# Therefore check for their presence.
for exp in ensure_iterable(expression):
if isinstance(exp, re.Pattern):
raise DatasetTypeExpressionError("Regular expressions are not supported.")
try:
wildcard = CategorizedWildcard.fromExpression(
expression, coerceUnrecognized=lambda d: (d.name, d)
expression,
coerceUnrecognized=lambda d: (d.name, d),
)
except TypeError as err:
raise DatasetTypeExpressionError(f"Invalid dataset type expression: {expression!r}.") from err
Expand Down
9 changes: 9 additions & 0 deletions tests/test_butler.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@
import pickle
import posixpath
import random
import re
import shutil
import string
import tempfile
Expand Down Expand Up @@ -94,6 +95,7 @@ def mock_aws(*args: Any, **kwargs: Any) -> Any: # type: ignore[no-untyped-def]
CollectionTypeError,
ConflictingDefinitionError,
DataIdValueError,
DatasetTypeExpressionError,
MissingCollectionError,
OrphanedRecordError,
)
Expand Down Expand Up @@ -1142,6 +1144,13 @@ def testGetDatasetTypes(self) -> None:
fromRegistry.update(parent_dataset_type.makeAllComponentDatasetTypes())
self.assertEqual({d.name for d in fromRegistry}, datasetTypeNames | components)

# Query with wildcard.
dataset_types = butler.registry.queryDatasetTypes("metric*")
self.assertEqual(len(dataset_types), 4, f"Got: {dataset_types}")
# but not regex.
with self.assertRaises(DatasetTypeExpressionError):
butler.registry.queryDatasetTypes(["pvi", re.compile("metric.*")])

# Now that we have some dataset types registered, validate them
butler.validateConfiguration(
ignore=[
Expand Down

0 comments on commit 47941f4

Please sign in to comment.