-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] GroupByKey() fails if PCollection is empty and direct_running_mode is set #25315
Labels
Milestone
Comments
SamHjelmfelt
changed the title
(Python) GroupByKey() fails if PCollection is empty and direct_running_mode is set
[Bug] GroupByKey() fails if PCollection is empty and direct_running_mode is set
Feb 5, 2023
We suspect this was caused by: #26190 Another anecdote implicating the same PR was reproducible as below:
Edit penguin_pipeline_local_e2e_test.py:
Run the test with following command:
|
3 tasks
This will be solved in the version 2.50.0. Thanks |
Merged
3 tasks
damccorm
added
the
done & done
Issue has been reviewed after it was closed for verification, followups, etc.
label
Aug 1, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
API: python
apache-beam version: 2.44.0 (installed via pip)
Python version: 3.10.9
The first step of my pipeline collects all files ready for processing using fileio.MatchAll(). It is possible that no files will be available, and file.MatchAll() will output zero records. If zero records are output, the expected behavior is that no processing will be done, but if direct_running_mode is set to any value, an exception is thrown by GroupByKey().
Here is sample code to reproduce the error and examples for all 8 scenarios:
Without direct_running_mode:
With direct running mode:
The text was updated successfully, but these errors were encountered: