You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Java SDK pipelines using the FnApi can become wedged if the single data stream is blocked attempting to queue records for a process bundle handler which was never registered.
It seems like this same stuckness could also occur with the error handling if we did create the bundle handler successfully but encountered an exception during processing and unregistered the instruction id and then had more data show up for the instruction id. The healthy case ensures that the data for the instruction is drained before returning and removing the registration.
There could be a control stream issue where the instruction id doesn't arrive but I think that there is a race with error handling that could explain why the future is never notified. It seems possible that a handler for the instruction id was recieved and registered but then was removed due to processing completing with an error. If an element on the data stream arrives for the instruction id after this point it will create a new entry in the map for the instruction id which was removed, and will block forever because the instruction id will not be registered and have it's future notified.
To fix I think we could:
catch and retry the exception creating the bundle processor
add a timeout to the data multiplexing waiting for the instruction to arrive. If the timeout is exceeded we'd have to error out the data stream causing SDK restart but that is preferrable to being stuck forever. We could reduce this triggering for the identified races above by having a cache of recently errored instructions.
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
Component: Python SDK
Component: Java SDK
Component: Go SDK
Component: Typescript SDK
Component: IO connector
Component: Beam YAML
Component: Beam examples
Component: Beam playground
Component: Beam katas
Component: Website
Component: Infrastructure
Component: Spark Runner
Component: Flink Runner
Component: Samza Runner
Component: Twister2 Runner
Component: Hazelcast Jet Runner
Component: Google Cloud Dataflow Runner
The text was updated successfully, but these errors were encountered:
What happened?
Java SDK pipelines using the FnApi can become wedged if the single data stream is blocked attempting to queue records for a process bundle handler which was never registered.
It seems like this same stuckness could also occur with the error handling if we did create the bundle handler successfully but encountered an exception during processing and unregistered the instruction id and then had more data show up for the instruction id. The healthy case ensures that the data for the instruction is drained before returning and removing the registration.
There could be a control stream issue where the instruction id doesn't arrive but I think that there is a race with error handling that could explain why the future is never notified. It seems possible that a handler for the instruction id was recieved and registered but then was removed due to processing completing with an error. If an element on the data stream arrives for the instruction id after this point it will create a new entry in the map for the instruction id which was removed, and will block forever because the instruction id will not be registered and have it's future notified.
To fix I think we could:
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
The text was updated successfully, but these errors were encountered: