You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have found only one reference to DataFrame.foreachbatch function in the StreamWriter class (spark.writers.stream.py:292)
One additional consideration after reading the DOC
This function behaves differently in Spark Connect mode. See examples.
In Connect, the provided function doesn't have access to variables defined outside of it.
Examples
--------
>>> import time
>>> df = spark.readStream.format("rate").load()
>>> my_value = -1
>>> def func(batch_df, batch_id):
... global my_value
... my_value = 100
... batch_df.collect()
...
>>> q = df.writeStream.foreachBatch(func).start()
>>> time.sleep(3)
>>> q.stop()
>>> # if in Spark Connect, my_value = -1, else my_value = 100
This is not happening in Koheesio, but maybe it should be made explicit in the StreamWriter documentation for the field batch_function.
SynchronizeDeltaToSnowflakeTask does not have additional calls to foreachbatch as it reuses StreamWriter
Is your feature request related to a problem? Please describe.
This issue is related to #33. Goal is to investigate the impacts on foreachbatch calls in the code.
The text was updated successfully, but these errors were encountered: