Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core feature] Integrate with Soda.io #2884

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

10sharmashivam
Copy link

@10sharmashivam 10sharmashivam commented Oct 31, 2024

Tracking issue

Reference Issue - #2963

Why are the changes needed?

The Soda.io plugin integrates data quality monitoring into Flyte, enabling users to automatically perform data quality checks on their datasets. This integration ensures that data quality is maintained throughout workflows, providing users with timely insights into data issues and enhancing the reliability and trustworthiness of their data-driven processes.

What changes were proposed in this pull request?

1.	Basic Integration with Soda.io: Enables execution of data quality checks using custom configurations.
2.	Configurable Scan Definitions: Users can specify a YAML file for scan definitions, allowing for greater flexibility.
3.	API Key Support: Integrates with Soda Cloud using a Custom API Key per User, enabling secure and personalized data quality scans instead of a single shared API key for centralized usage. i hope I choose correctly(please let me know, if I should choose otherwise)

How was this patch tested?

I tested this basic prototype of the plugin through basic unit tests locally, covering both positive and negative scenarios. The tests ensure that the new functionality works as expected and that any edge cases are handled gracefully.

Additional Comments

I’m submitting this PR as an initial implementation of the Soda.io plugin integration. At this stage, I’ve structured the plugin, created the SodaCheckConfig and SodaCheckTask classes, and registered the plugin with Flyte.

I’d greatly appreciate any feedback or guidance to ensure that I’m on the right track with this implementation. I’ll continue to refine the code and address any changes based on the team’s suggestions and feedback.

Looking forward to your insights!

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed off.

Future Enhancements

Looking ahead, I plan to implement the following enhancements to further improve the plugin:

•	Result Parsing and Reporting: Enhance the formatting of result outputs for better readability and usability.
•	Error Handling: Implement more robust error handling for configuration and API-related issues to improve user experience.
•	Scan Monitoring: Add support for monitoring the status of ongoing scans to provide real-time feedback to users.
•	Scheduling and Triggers: Enable automated scheduling or triggering of scans based on task dependencies within Flyte, enhancing the plugin’s automation capabilities.

Please let me know if I chose the correct approach and should continue with this and enhance this plugin!

Copy link

codecov bot commented Oct 31, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 40.38%. Comparing base (6bf6f8e) to head (ca275b9).
Report is 5 commits behind head on master.

❗ There is a different number of reports uploaded between BASE (6bf6f8e) and HEAD (ca275b9). Click for more details.

HEAD has 3 uploads less than BASE
Flag BASE (6bf6f8e) HEAD (ca275b9)
4 1
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #2884       +/-   ##
===========================================
- Coverage   76.79%   40.38%   -36.42%     
===========================================
  Files         196      199        +3     
  Lines       20546    20781      +235     
  Branches     2646     2671       +25     
===========================================
- Hits        15779     8393     -7386     
- Misses       4068    12287     +8219     
+ Partials      699      101      -598     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@10sharmashivam
Copy link
Author

10sharmashivam commented Nov 8, 2024

Hi @kumare3 @wild-endeavor @thomasjpfan @Future-Outlier @eapolinario @pingsutw @samhita-alla @cosmicBboy, I wanted to follow up on this PR, about the basic prototype of this plugin and check if I’m heading in the right direction with the integration.

Also had a few questions in mind,

  1. API Key Approach: I’ve chosen to use a Custom API Key per User for the Soda.io integration, rather than a single shared API key. Please let me know if this approach seems correct or if I should consider an alternative method.
  2. Basic Functionality: I’ve structured the plugin and implemented the SodaCheckConfig and SodaCheckTask classes. Is this the right way to approach the integration, or would you suggest any improvements?
  3. Future Enhancements: I’ve listed potential future enhancements (e.g., result parsing, error handling, scan monitoring). Do you think these are aligned with the direction of Flyte, or should I consider different enhancements?

I look forward to your insights and any guidance on improving this PR. Please let me know if I’m on the right track, or if there are areas I should revisit.

Thanks again!

@davidmirror-ops
Copy link

@10sharmashivam thank you. Please notice that recently we merged this RFC. It means that community-contributed plugins should live under plugins/community. I'll ask plugin maintainers for a review again

@10sharmashivam
Copy link
Author

@10sharmashivam thank you. Please notice that recently we merged this RFC. It means that community-contributed plugins should live under plugins/community. I'll ask plugin maintainers for a review again

Hi @davidmirror-ops, thanks for the update! I reviewed the recent RFC regarding community-contributed plugins and the proposed plugins/community structure. Could you confirm if this folder structure is fully implemented and ready for use?

Thanks again for your guidance—I’ll proceed based on your input.

@davidmirror-ops
Copy link

@10sharmashivam the folder is not created yet, waiting for the first contribution after the RFC :) So feel free to create it. Also, please check the lint errors

@10sharmashivam
Copy link
Author

@10sharmashivam the folder is not created yet, waiting for the first contribution after the RFC :) So feel free to create it. Also, please check the lint errors

@davidmirror-ops Thank you for the clarification! I’ve set up the plugins/community structure and moved the plugin there, following the RFC as discussed. Also, I’ve addressed the linting issues—everything should be in order now.

Let me know if there’s anything else you’d like me to adjust. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants