ML Data-Pipeline Ingestion - Optimisation for scaling #32
Replies: 3 comments 9 replies
-
Druid is never supposed to be used as source of truth. There are two approaches over here:
In addition, don't generalize an exhaust job on Cassandra. Cassandra query patterns are different and needs to be fine tuned specific to corresponding tables similar to ProgressExhaust or ResponseExhaust. You can just create a ProjectExhaust dataproduct similar to ProgressExhaust. |
Beta Was this translation helpful? Give feedback.
-
@Shakthieshwari which block do Projects and Observations sit under currently? Is it SB Ed ? |
Beta Was this translation helpful? Give feedback.
-
@alok Gupta ***@***.***> , Sure, Will schedule a call in the
next couple of days to discuss the SL capabilities and alignment of the
same to the Sunbird BBs.
Regards
Vijayashree
…On Tue, Sep 27, 2022 at 10:34 AM Alok Gupta ***@***.***> wrote:
couple of points
1. there is nothing ML services in Sunbird. ML (Manage Learn) is an
construct in context to use cases which an adopter might enable using
Sunbird BBs. @Shakthieshwari <https://github.com/Shakthieshwari> - can
you pls add Vijayshree into this thread. I am not able to find her user
name.
2. @Shakthieshwari <https://github.com/Shakthieshwari> - pls request
Vijayshree and Khushboo to initiate a call to discuss and finalize what are
the "new components" SL has been contributing and which BB these components
should be in
—
Reply to this email directly, view it on GitHub
<#32 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ASLP6Q7AEEDZBFAFWJATVGTWAJ54DANCNFSM6AAAAAAQTE2QZM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Hello Team,
We are planning to do few ML Data Optimisation for scaling .
JIra Ticket Link :- https://project-sunbird.atlassian.net/browse/OB-57
Problem Statement :- Avoid Deletion of Projects Druid Datasource -> Program Dashboard CSV use this datasource
Reason for Deletion of Datasource :- Since the Status of the project vary every time and druid doesn't support updating a record, We are daily deleting the entire data from druid and re-ingesting the whole data into druid on a daily basis to get the updated status of a submission.
Concern :- Huge Data Handling
Approach(Solution) :- Please check this confluence doc https://project-sunbird.atlassian.net/l/cp/P7nq918u , we have detailed out the design.
Similar to OnDemandDruidExhaustJob, we need to create the OnDemandCassandraExhaustJob Data Product.
Please provide us your @SanthoshVasabhaktula @sowmya-dixit @anandp504 approval and suggestions, if we can go a head on this.
Cc- @aishwaryashikshalokam @Ashwiniev95 @Prateek-slokam @aks30 @kiranharidas187 @vijiurs @snehangsude
Please do the needful at the earliest, as this is very highest priority for the program launch.
Thanks
Beta Was this translation helpful? Give feedback.
All reactions