Alternate exit criteria for DataIterators #335

vyshah · 2022-07-19T19:29:54Z

For some context, the schema I'm trying to use Ghostferry with is based on EAV. Each table is a property table and an object will typically correspond to multiple records across various tables.

I'm also trying to implement sharding support with an implementation of CopyFilter.

My CopyFilter implementation generates object IDs for a given shard in batches and uses those IDs in BuildSelect to form the base SQL query to copy records, e.g.

SELECT ... FROM ... WHERE pagination_key IN [object_id1...object_id20]

I'm currently running into a problem where this query sometimes yields 0 rows for a property table for a batch of object IDs, leading the Cursor to terminate iteration early even though there are still objects in the shard left to process.

Ideally, the flow I need looks something like:

Generate a batch of object IDs
Copy all records across all property tables corresponding to these object IDs
Terminate DataIterators if no more object IDs remain in shard

I don't believe this is possible without opening a PR against ghostferry, but let me know if I'm missing something. If it isn't possible, do you have any suggestions on how to implement this?

I'm thinking I'll need to have the DataIterator understand these object ID batches and do multiple dataIterator.Run()s for each batch before exiting. I see a few options here:

A: Define an interface on DataIterator that supports different termination conditions
B: Put the current DataIterator implementation behind an interface so Ferry can support an alternate implementation

Any guidance here would be appreciated - thanks for your time.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alternate exit criteria for DataIterators #335

Alternate exit criteria for DataIterators #335

vyshah commented Jul 19, 2022 •

edited

Loading

Alternate exit criteria for DataIterators #335

Alternate exit criteria for DataIterators #335

Comments

vyshah commented Jul 19, 2022 • edited Loading

vyshah commented Jul 19, 2022 •

edited

Loading