-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] DataFrame type fails pydantic validation when using Spark Connect / Serverless #62
Comments
Maybe take a look at here: https://github.com/Nike-Inc/koheesio/tree/33-feature-ensure-that-we-can-support-dbr-143lts |
We will have to integrate changes you doing to this branch. |
This commit especially: 2e86208#diff-5a83c2ed86f340ff6a3110cd2a8a71ddd29947487301ad7941e4d99fbc8def6cR12 |
@mikita-sakalouski to be honest, I would prefer if we treat this a separate issue and merge to main as part of the #59 . |
@maxim-mityutko Looks like PR #63 is covering requested functionality, please check and close the request. |
Solved with #63 |
Is your feature request related to a problem? Please describe.
Pydantic enforces strict types. In the current implementation all Spark related logic (readers, writers, transforms, integrations) expect DataFrame (pyspark.sql.DataFrame) class as input or output. However in Spark Connect and subsequently in the Serverless compute the DataFrame class is pyspark.sql.connect.DataFrame, which causes errors in pydantic model validations.
Describe the solution you'd like
Model should except both native and connect DataFrames as a valid input / output
Describe alternatives you've considered
...
Additional context
...
The text was updated successfully, but these errors were encountered: