Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add float_scientific option to write_csv/sink_csv #17111

Merged
merged 3 commits into from
Jun 24, 2024

Conversation

lukeshingles
Copy link
Contributor

@lukeshingles lukeshingles commented Jun 21, 2024

Closes #11929.

This PR adds an optional boolean float_scientific keyword option to write_csv and sink_csv that can be None (default to previous automatic behaviour), true (always use scientific notation), false (always use positional notation).

Prior to this PR, either scientific or positional notation is chosen automatically for each float value during write_csv, unless float_precision is set, in which case positional notation is always used. In my own application, I want to output in scientific notion (since values can be extremely small, e.g. 10^-15) but use float_precision=4 to keep only a few significant figures to reduce the output file size. Performing the formatting in python is about 15x slower than this solution.

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Jun 21, 2024
@lukeshingles lukeshingles marked this pull request as ready for review June 21, 2024 12:43
@stinodego stinodego changed the title feat: Add float_scientific option to write_csv/sink_csv feat: Add float_scientific option to write_csv/sink_csv Jun 21, 2024
Copy link

codecov bot commented Jun 22, 2024

Codecov Report

Attention: Patch coverage is 80.26316% with 15 lines in your changes missing coverage. Please review.

Project coverage is 80.88%. Comparing base (8a6bf4b) to head (8771d8f).
Report is 17 commits behind head on main.

Files Patch % Lines
...s/polars-io/src/csv/write/write_impl/serializer.rs 76.19% 15 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #17111      +/-   ##
==========================================
+ Coverage   80.86%   80.88%   +0.02%     
==========================================
  Files        1456     1456              
  Lines      191141   191410     +269     
  Branches     2728     2739      +11     
==========================================
+ Hits       154562   154824     +262     
- Misses      36073    36079       +6     
- Partials      506      507       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@lukeshingles
Copy link
Contributor Author

lukeshingles commented Jun 22, 2024

Codecov Report

Attention: Patch coverage is 80.26316% with 15 lines in your changes missing coverage. Please review.

Project coverage is 80.88%. Comparing base (8a6bf4b) to head (8771d8f).
Report is 17 commits behind head on main.

Files Patch % Lines
...s/polars-io/src/csv/write/write_impl/serializer.rs 76.19% 15 Missing ⚠️
Additional details and impacted files
☔ View full report in Codecov by Sentry. 📢 Have feedback on the report? Share it here.

These same lines were not covered in the tests previously, so my addition of similar functions adds some lines that are not covered. I don’t think it should be possible to trigger this error condition through the Python API.

@ritchie46
Copy link
Member

Alright. Thanks @lukeshingles

@ritchie46 ritchie46 merged commit 0759e5f into pola-rs:main Jun 24, 2024
27 checks passed
@Wainberg
Copy link
Contributor

@lukeshingles this only partially addresses the original issue (#11929) that I raised. There's still no pure-Rust equivalent of .with_columns(pl.selectors.float().map_elements('{:.12g}'.format)). float_scientific=True is like Python's e format specifier, but having the ability to switch to scientific notation for only numbers below a certain magnitude (which is 1e-4 for g) is very handy for brevity and readability. I personally won't be using float_scientific=True, but would love to use a pure-Rust equivalent to g.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support scientific notation in write_csv()
3 participants