Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add keep_column[s] params to to_dummies #14844

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

mcrumiller
Copy link
Contributor

@mcrumiller mcrumiller commented Mar 4, 2024

Resolves #14831.

Was fairly easy to implement so no harm if rejected. Not super happy about the different parameter names for Series and DataFrame, any suggested alternatives? keep_original would work but sounds ugly to me.

>>> import polars as pl
>>> pl.Series("a", [1, 2, 3]).to_dummies(keep_column=True)
shape: (3, 4)
┌─────┬─────┬─────┬─────┐
│ a   ┆ a_1 ┆ a_2 ┆ a_3 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ u8  ┆ u8  ┆ u8  │
╞═════╪═════╪═════╪═════╡
│ 1   ┆ 1   ┆ 0   ┆ 0   │
│ 2   ┆ 0   ┆ 1   ┆ 0   │
│ 3   ┆ 0   ┆ 0   ┆ 1   │
└─────┴─────┴─────┴─────┘

>>> pl.DataFrame({"a": [1, 2], "b": [3, 4]}).to_dummies(keep_columns=True)
shape: (2, 6)
┌─────┬─────┬─────┬─────┬─────┬─────┐
│ a   ┆ a_1 ┆ a_2 ┆ b   ┆ b_3 ┆ b_4 │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ u8  ┆ u8  ┆ i64 ┆ u8  ┆ u8  │
╞═════╪═════╪═════╪═════╪═════╪═════╡
│ 1   ┆ 1   ┆ 0   ┆ 3   ┆ 1   ┆ 0   │
│ 2   ┆ 0   ┆ 1   ┆ 4   ┆ 0   ┆ 1   │
└─────┴─────┴─────┴─────┴─────┴─────┘

@mcrumiller mcrumiller changed the title feat(python, rust): add keep-column[s] params to to_dummies feat(python, rust): add keep_column[s] params to to_dummies Mar 4, 2024
@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Mar 4, 2024
@mcrumiller mcrumiller marked this pull request as ready for review March 4, 2024 21:38
@mcrumiller
Copy link
Contributor Author

Unsure why code coverage failed, I added coverage tests.

@s-banach
Copy link
Contributor

s-banach commented Mar 4, 2024

What happens if one of the new dummy columns has the same name as the original column?

@mcrumiller
Copy link
Contributor Author

mcrumiller commented Mar 4, 2024

What happens if one of the new dummy columns has the same name as the original column?

That would have to be very contrived. The dummy columns are named by [original_name]_[value]. One would have to have explicitly already had a column with that name, and that issue is present on the existing implementation anyway:

import polars as pl

df = pl.DataFrame({
    "a": [1, 2, 3],
    "a_1": [1, 2, 3],
})
df.to_dummies("a")
polars.exceptions.DuplicateError: unable to hstack, column with name "a_1" already exists

My guess is that one can contrive many scenarios to collide with polars' renaming, but it's not in our best interest to fight those edge cases.

@s-banach
Copy link
Contributor

s-banach commented Mar 4, 2024

As long as it raises the same error.

Copy link

codecov bot commented Mar 8, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.19%. Comparing base (dcee934) to head (b56daed).
Report is 5 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #14844      +/-   ##
==========================================
+ Coverage   81.14%   81.19%   +0.04%     
==========================================
  Files        1363     1367       +4     
  Lines      175282   175326      +44     
  Branches     2527     2527              
==========================================
+ Hits       142236   142350     +114     
+ Misses      32568    32496      -72     
- Partials      478      480       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mcrumiller mcrumiller changed the title feat(python, rust): add keep_column[s] params to to_dummies feat: add keep_column[s] params to to_dummies Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add keep_column parameter to DataFrame.to_dummies
2 participants