Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] - DBR 14.3 support - foreachbatch impacts #56

Open
riccamini opened this issue Aug 1, 2024 · 2 comments · May be fixed by #97
Open

[FEATURE] - DBR 14.3 support - foreachbatch impacts #56

riccamini opened this issue Aug 1, 2024 · 2 comments · May be fixed by #97
Labels
enhancement New feature or request
Milestone

Comments

@riccamini
Copy link
Contributor

Is your feature request related to a problem? Please describe.

This issue is related to #33. Goal is to investigate the impacts on foreachbatch calls in the code.

@riccamini riccamini added the enhancement New feature or request label Aug 1, 2024
@riccamini
Copy link
Contributor Author

I have found only one reference to DataFrame.foreachbatch function in the StreamWriter class (spark.writers.stream.py:292)

One additional consideration after reading the DOC

    This function behaves differently in Spark Connect mode. See examples.
    In Connect, the provided function doesn't have access to variables defined outside of it.

    Examples
    --------
    >>> import time
    >>> df = spark.readStream.format("rate").load()
    >>> my_value = -1
    >>> def func(batch_df, batch_id):
    ...     global my_value
    ...     my_value = 100
    ...     batch_df.collect()
    ...
    >>> q = df.writeStream.foreachBatch(func).start()
    >>> time.sleep(3)
    >>> q.stop()
    >>> # if in Spark Connect, my_value = -1, else my_value = 100

This is not happening in Koheesio, but maybe it should be made explicit in the StreamWriter documentation for the field batch_function.

SynchronizeDeltaToSnowflakeTask does not have additional calls to foreachbatch as it reuses StreamWriter

@dannymeijer
Copy link
Member

@riccamini - can you verify that this issue is resolved with the upcoming 0.9 release?

@dannymeijer dannymeijer added this to the 0.9.0 milestone Nov 8, 2024
@dannymeijer dannymeijer linked a pull request Nov 11, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: In progress
Development

Successfully merging a pull request may close this issue.

2 participants