Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent COUNT(*) and GROUP BY behavior in Polars CLI #71

Open
2 tasks done
kwkeefer opened this issue Sep 20, 2024 · 0 comments
Open
2 tasks done

Inconsistent COUNT(*) and GROUP BY behavior in Polars CLI #71

kwkeefer opened this issue Sep 20, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@kwkeefer
Copy link

kwkeefer commented Sep 20, 2024

Checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of the Polars CLI.

Reproducible example

# generate test.csv
cat<<EOF > test.csv
a
test
test
test2
test3
EOF

# run group by query
echo "SELECT COUNT(*) AS _count, a FROM read_csv('test.csv') GROUP BY a;" | polars

Output

┌────────┬───────┐
│ _count ┆ a     │
│ ---    ┆ ---   │
│ u32    ┆ str   │
╞════════╪═══════╡
│ 3      ┆ test2 │
│ 3      ┆ test3 │
│ 3      ┆ test  │
└────────┴───────┘

Issue description

COUNT(*) is seemingly counting all rows, instead of using the group by.

Expected behavior

import polars as pl

df = pl.read_csv('test.csv')

with pl.SQLContext(register_globals=True, eager=True) as ctx:
    df_small = ctx.execute("SELECT COUNT(*) AS _count, a FROM df GROUP BY a")
    print(df_small)
python3 polarstest.py
shape: (3, 2)
┌────────┬───────┐
│ _count ┆ a     │
│ ---    ┆ ---   │
│ u32    ┆ str   │
╞════════╪═══════╡
│ 2      ┆ test  │
│ 1      ┆ test3 │
│ 1      ┆ test2 │
└────────┴───────┘

Installed version

0.8.0

@kwkeefer kwkeefer added the bug Something isn't working label Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant