-
Notifications
You must be signed in to change notification settings - Fork 295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added AzureBlobFileSystem support for StructuredDatasets #1109
Added AzureBlobFileSystem support for StructuredDatasets #1109
Conversation
Thank you for opening this pull request! 🙌 These tips will help get your PR across the finish line:
|
Signed-off-by: Nick Müller <[email protected]>
81b3370
to
82da680
Compare
Codecov Report
@@ Coverage Diff @@
## master #1109 +/- ##
=======================================
Coverage 86.95% 86.95%
=======================================
Files 276 276
Lines 25492 25493 +1
Branches 2865 2865
=======================================
+ Hits 22167 22168 +1
Misses 2847 2847
Partials 478 478
Continue to review full report at Codecov.
|
hey @MorpheusXAUT do you also have the data persistence plugin for abfs? is that what you meant by you previously added support for it under the hood? |
@wild-endeavor oh, good point 🤔 I believe that still hasn't been cleaned up & contributed back to this repo, sorry about that. 😕 |
this pr is fine. thanks! |
Signed-off-by: Nick Müller <[email protected]>
@wild-endeavor Looking at it again, I was actually mistaken last night. We don't use an extra (custom-written) datapersistence plugin for I've added EDIT: not quite sure why that one check failed, looks unrelated to me? |
rerunning the failed onnx job. not entirely sure what's happening there. |
BIGQUERY = "bq" | ||
S3 = "s3" | ||
ABFS = "abfs" | ||
GCS = "gs" | ||
LOCAL = "/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it time to turn this into an enum, @wild-endeavor ?
Congrats on merging your first pull request! 🎉 |
Signed-off-by: Nick Müller <[email protected]>
Signed-off-by: Nick Müller <[email protected]>
TL;DR
This PR adds support for storing
StructuredDatasets
using AzureBlobFileSystem (abfs
).Type
Are all requirements met?
Complete description
As discussed before, we've added support for
abfs
(usingadlfs
/stow
under the hood) forStructuredDatasets
by adding it to the registered protocols for transformers.I've also noticed one file (
plugins/flytekit-spark/flytekitplugins/spark/sd_transformers.py
) which still used string constants and didn't support GCS either. As we're currently not using Spark anywhere, I wasn't able to verify this change though, so I can revert those lines if you'd prefer.Tracking Issue
flyteorg/flyte#2709
Follow-up issue
NA