Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-41815: [Python] add use_threads option to feather.write_feather() #41820

Closed
wants to merge 4 commits into from

Conversation

pjh40
Copy link

@pjh40 pjh40 commented May 24, 2024

Rationale for this change

feather.write_feather could implicitly invoke ThreadPoolExecutor when writing a pandas.DataFrame, making it unusable in an atexit context. More details in #41815

What changes are included in this PR?

Added a new default True parameter use_threads to write_feather. This name was selected to match that in read_feather. For use_threads=True, write_feather passes an nthreads=None option to Table.from_pandas, which will cause pandas_compat.dataframe_to_array() to heuristically determine whether to use threads or not. For use_threads=False, nthreads=1, which will result in serial behavior.

Are these changes tested?

Yes, test_use_threads in test_feather.py was expanded to ensure that serial and parallel writes produce the same file.

Are there any user-facing changes?

Yes, new default parameter to user-facing write_feather.

Copy link

⚠️ GitHub issue #41815 has been automatically assigned in GitHub to PR creator.

@pjh40 pjh40 force-pushed the GH41815_write_feather_use_threads branch 2 times, most recently from a857aae to 4333111 Compare June 2, 2024 16:34
@pjh40 pjh40 force-pushed the GH41815_write_feather_use_threads branch from 4333111 to ec68ec3 Compare June 12, 2024 13:36
@pjh40 pjh40 force-pushed the GH41815_write_feather_use_threads branch from ec68ec3 to 3669ba2 Compare June 24, 2024 22:49
@pjh40 pjh40 force-pushed the GH41815_write_feather_use_threads branch from 3669ba2 to 64fa348 Compare July 10, 2024 16:42
@pjh40 pjh40 force-pushed the GH41815_write_feather_use_threads branch from 64fa348 to 3dc0af6 Compare August 10, 2024 18:48
@pjh40 pjh40 force-pushed the GH41815_write_feather_use_threads branch from 3dc0af6 to 4f673fd Compare September 16, 2024 16:44
Allow the caller to specify serial execution of `write_feather` with
the `use_threads=False` option, or allow for a threaded implementation
with the default `use_threads=True` option.  The serial option is
useful in `atexit` contexts, where it is illegal to start a
`concurrent.futures.ThreadPoolExecutor`.
Test that `write_feather(..., use_threads=False)` produces the same
result as `write_feather(..., use_threads=True)` for a
`pandas.DataFrame`.
@pjh40 pjh40 force-pushed the GH41815_write_feather_use_threads branch from 4f673fd to 838d59d Compare September 16, 2024 16:45
@pjh40 pjh40 closed this Sep 20, 2024
@pjh40 pjh40 deleted the GH41815_write_feather_use_threads branch September 20, 2024 19:44
@pjh40 pjh40 restored the GH41815_write_feather_use_threads branch September 20, 2024 19:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant