-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-45680: Allow boolean columns to be used in query 'where' #1051
Conversation
56f222d
to
5b4498c
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1051 +/- ##
==========================================
+ Coverage 89.40% 89.44% +0.04%
==========================================
Files 360 360
Lines 45733 45844 +111
Branches 9382 9406 +24
==========================================
+ Hits 40888 41006 +118
+ Misses 3519 3506 -13
- Partials 1326 1332 +6 ☔ View full report in Codecov by Sentry. |
@classmethod | ||
def from_bool_expression(cls, value: ColumnExpression) -> Predicate: | ||
"""Construct a predicate that wraps a boolean ColumnExpression, taking | ||
on the value of the underlying ColumnExpression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had hoped to never have to allow a boolean ColumnExpression
to be constructed (going to Predicate
immediately instead). But I think I've figured out why you didn't do that, and I agree with it: we'd need Predicate
-subclass versions of DimensionFieldReference
and DatasetFieldReference
, and we'd need to include them in all of the cases where column references of other types already appear, and that'd be a much bigger problem. Is that right?
If so, I think instead we need to be extra vigilant about putting those column expressions in their predicate wrappers as early as possible, and keeping boolean operators out of those column expressions, because that would subvert all of the boolean-expression normalization and reasoning we do. I suspect typing is already going to do prevent problems pretty effectively, since we don't have any boolean-operator expressions, but we could lock it down a bit further by declaring that value
here (and in BooleanWrapper
) must be ColumnReference
, not merely ColumnExpression
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking wasn't actually that involved, I just observed that:
- Boolean ColumnExpressions already existed, and it didn't occur to me to go out of my way to create a separate representation for boolean SQL columns than the one used for int or string.
- Most of the other predicates worked by wrapping one or more column expressions with a little bit of information specifying how they convert to booleans -- and "no-op" seemed like a reasonable addition to the set of conversions.
So I just went with the grain of what was already there and it slotted in easily with minimal code.
I do share your concern about this problem potentially coming back since any place that references a boolean column will need to do the same thing, but that's sort of inherent to treating booleans as a special case. Changing the type is a good suggestion, I'll do that.
I suspect handling boolean_column = NULL
is going to get ugly though so I may need to figure something else out -- or just not support it, or special case it to only work for a single boolean column and not arbitrary boolean expressions.
8c4c852
to
70730b3
Compare
def is_null(self) -> tree.Predicate: | ||
return ResolvedScalarExpressionProxy(self._boolean_expression).is_null | ||
|
||
def as_boolean(self) -> tree.Predicate: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like forcing users to call as_boolean()
to get a Predicate out of this, but I didn't really have a better idea.
We can't just return a Predicate directly from DimensionElementProxy.__getattr__
because it would completely destroy the static typing. The return type would become ScalarExpressionProxy | Predicate
and those types have almost no methods in common.
This at least throws a semi-helpful runtime error if you fail to call as_boolean()
or attempt to do a comparison like boolean_column == 0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really wish Python typing had something like Any[A | B]
, where it'd happily do an implicit cast to one of the options you give it without losing the typing entirely (the way Any
does).
I'm not convinced the static typing is so important here that it's worth doing even slight harm to the runtime interface - I'm imagining users in Jupyter notebooks doing tab completion as the most important use case for ExpressionFactory
. But I do expect boolean columns to be very rare, and this is a small change that's easy to revisit later, and the same cannot be said of trying to put the entire Predicate
interface into ScalarExpressionProxy
.
70730b3
to
128cf76
Compare
Fix an issue where boolean metadata columns (like exposure.can_see_sky and exposure.has_simulated) were not usable in "where" clauses for Registry query functions. These column names can now be used as a boolean expression, for example `where="exposure.can_see_sky"` or `where="NOT exposure.can_see_sky"`.
We need roughly the same test for both registry and the new query system, so move it out of the registry tests to a place where we have access to both query systems.
Add an as_boolean() method to the expression proxies used to reference dimension fields, so boolean-valued fields can be converted to Predicate instances that can be used to constrain a query.
Boolean columns in the DB are always nullable (and exposure.can_see_sky has many NULLs in the real databases.) So add a way to search for them in the query system.
Add support for `= NULL` and `!= NULL` syntax for boolean columns in query where strings.
128cf76
to
7aa86be
Compare
def is_null(self) -> tree.Predicate: | ||
return ResolvedScalarExpressionProxy(self._boolean_expression).is_null | ||
|
||
def as_boolean(self) -> tree.Predicate: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really wish Python typing had something like Any[A | B]
, where it'd happily do an implicit cast to one of the options you give it without losing the typing entirely (the way Any
does).
I'm not convinced the static typing is so important here that it's worth doing even slight harm to the runtime interface - I'm imagining users in Jupyter notebooks doing tab completion as the most important use case for ExpressionFactory
. But I do expect boolean columns to be very rare, and this is a small change that's easy to revisit later, and the same cannot be said of trying to put the entire Predicate
interface into ScalarExpressionProxy
.
Fix an issue where boolean metadata columns (like exposure.can_see_sky and exposure.has_simulated) were not usable in "where" clauses for query functions.
These column names can now be used as a boolean expression, for example
where="exposure.can_see_sky
orwhere="NOT exposure.can_see_sky"
.In the new query system's
ExpressionFactory
, you can convert a boolean column to a usable boolean Predicate by callingas_boolean()
.Checklist
doc/changes
configs/old_dimensions