You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For some context, the schema I'm trying to use Ghostferry with is based on EAV. Each table is a property table and an object will typically correspond to multiple records across various tables.
I'm also trying to implement sharding support with an implementation of CopyFilter.
My CopyFilter implementation generates object IDs for a given shard in batches and uses those IDs in BuildSelect to form the base SQL query to copy records, e.g.
SELECT ... FROM ... WHERE pagination_key IN [object_id1...object_id20]
I'm currently running into a problem where this query sometimes yields 0 rows for a property table for a batch of object IDs, leading the Cursor to terminate iteration early even though there are still objects in the shard left to process.
Ideally, the flow I need looks something like:
Generate a batch of object IDs
Copy all records across all property tables corresponding to these object IDs
Terminate DataIterators if no more object IDs remain in shard
I don't believe this is possible without opening a PR against ghostferry, but let me know if I'm missing something. If it isn't possible, do you have any suggestions on how to implement this?
I'm thinking I'll need to have the DataIterator understand these object ID batches and do multiple dataIterator.Run()s for each batch before exiting. I see a few options here:
A: Define an interface on DataIterator that supports different termination conditions
B: Put the current DataIterator implementation behind an interface so Ferry can support an alternate implementation
Any guidance here would be appreciated - thanks for your time.
The text was updated successfully, but these errors were encountered:
For some context, the schema I'm trying to use Ghostferry with is based on EAV. Each table is a property table and an object will typically correspond to multiple records across various tables.
I'm also trying to implement sharding support with an implementation of CopyFilter.
My CopyFilter implementation generates object IDs for a given shard in batches and uses those IDs in BuildSelect to form the base SQL query to copy records, e.g.
SELECT ... FROM ... WHERE pagination_key IN [object_id1...object_id20]
I'm currently running into a problem where this query sometimes yields 0 rows for a property table for a batch of object IDs, leading the Cursor to terminate iteration early even though there are still objects in the shard left to process.
Ideally, the flow I need looks something like:
I don't believe this is possible without opening a PR against ghostferry, but let me know if I'm missing something. If it isn't possible, do you have any suggestions on how to implement this?
I'm thinking I'll need to have the DataIterator understand these object ID batches and do multiple
dataIterator.Run()
s for each batch before exiting. I see a few options here:Any guidance here would be appreciated - thanks for your time.
The text was updated successfully, but these errors were encountered: