Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filtering dates in SQL using BETWEEN raises a ComputeError #16275

Closed
2 tasks done
graydenshand opened this issue May 16, 2024 · 1 comment · Fixed by #16279
Closed
2 tasks done

Filtering dates in SQL using BETWEEN raises a ComputeError #16275

graydenshand opened this issue May 16, 2024 · 1 comment · Fixed by #16279
Assignees
Labels
A-sql Area: Polars SQL functionality bug Something isn't working python Related to Python Polars

Comments

@graydenshand
Copy link

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl
from datetime import datetime, date, timezone

df = pl.DataFrame({
    "idx": [0, 1, 2],
    "dtm": [
        datetime(2024, 1, 7, 1, 2, 3, 123456, tzinfo=timezone.utc),
        datetime(2006, 1, 1, 23, 59, 59, 555555, tzinfo=timezone.utc),
        datetime(2020, 12, 30, 10, 30, 45, 987654, tzinfo=timezone.utc),
    ],
    "dt": [
        date(2020, 12, 30),
        date(2077, 1, 1),
        date(1960, 1, 7),
    ],
})

# Succeeds
print(df.sql("SELECT * FROM self WHERE dtm >= '2024-01-07 01:02:03+00:00' AND dtm <= '2025-01-01 01:02:04+00:00'"))

# Fails with ComputeError
print(df.sql("SELECT * FROM self WHERE dtm BETWEEN '2024-01-07 01:02:03+00:00' AND '2025-01-01 01:02:04+00:00'"))

Log output

Traceback (most recent call last):
  File "/Users/gshand/code/timely/polars_query.py", line 22, in <module>
    print(df.sql("SELECT * FROM self WHERE dtm BETWEEN '2024-01-07 01:02:03+00:00' AND '2025-01-01 01:02:04+00:00'"))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gshand/code/timely/.venv/lib/python3.12/site-packages/polars/dataframe/frame.py", line 4477, in sql
    return ctx.execute(query)  # type: ignore[return-value]
           ^^^^^^^^^^^^^^^^^^
  File "/Users/gshand/code/timely/.venv/lib/python3.12/site-packages/polars/sql/context.py", line 270, in execute
    return res.collect() if (eager or self._eager_execution) else res
           ^^^^^^^^^^^^^
  File "/Users/gshand/code/timely/.venv/lib/python3.12/site-packages/polars/lazyframe/frame.py", line 1816, in collect
    return wrap_df(ldf.collect(callback))
                   ^^^^^^^^^^^^^^^^^^^^^
polars.exceptions.ComputeError: cannot compare 'date/datetime/time' to a string value (create native python { 'date', 'datetime', 'time' } or compare to a temporal column)

This error occurred with the following context stack:
        [1] 'filter' failed
        [2] 'select' input failed to resolve

Issue description

Closely related to #15956. Support for implicit string-> temporal conversion in sql queries was recently added (#15958).

However, it appears this only works when using inequality operators such as >. When using the BETWEEN keyword a ComputeError is still raised.

Expected behavior

Query succeeds whether comparing using inequality operators or the BETWEEN keyword.

Installed versions

--------Version info---------
Polars:               0.20.26
Index type:           UInt32
Platform:             macOS-14.4.1-arm64-arm-64bit
Python:               3.12.0 (main, Nov 11 2023, 13:36:14) [Clang 15.0.0 (clang-1500.0.40.1)]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               2024.3.1
gevent:               24.2.1
hvplot:               <not installed>
matplotlib:           <not installed>
nest_asyncio:         <not installed>
numpy:                1.26.4
openpyxl:             <not installed>
pandas:               2.2.2
pyarrow:              14.0.2
pydantic:             2.7.1
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           2.0.29
torch:                <not installed>
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>
@graydenshand graydenshand added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels May 16, 2024
@alexander-beedie
Copy link
Collaborator

I'll take a look; in the meantime I recommend using the form 'yyyy-mm-dd'::date and 'yyyy-mm-dd hh:mm:ss'::datetime to avoid ambiguity and work around places where we haven't detected the string dates/datetimes 👌

@alexander-beedie alexander-beedie added A-sql Area: Polars SQL functionality and removed needs triage Awaiting prioritization by a maintainer labels May 16, 2024
@alexander-beedie alexander-beedie self-assigned this May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql Area: Polars SQL functionality bug Something isn't working python Related to Python Polars
Projects
None yet
2 participants