Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternate exit criteria for DataIterators #335

Open
vyshah opened this issue Jul 19, 2022 · 0 comments
Open

Alternate exit criteria for DataIterators #335

vyshah opened this issue Jul 19, 2022 · 0 comments

Comments

@vyshah
Copy link

vyshah commented Jul 19, 2022

For some context, the schema I'm trying to use Ghostferry with is based on EAV. Each table is a property table and an object will typically correspond to multiple records across various tables.

I'm also trying to implement sharding support with an implementation of CopyFilter.

My CopyFilter implementation generates object IDs for a given shard in batches and uses those IDs in BuildSelect to form the base SQL query to copy records, e.g.

SELECT ... FROM ... WHERE pagination_key IN [object_id1...object_id20]

I'm currently running into a problem where this query sometimes yields 0 rows for a property table for a batch of object IDs, leading the Cursor to terminate iteration early even though there are still objects in the shard left to process.

Ideally, the flow I need looks something like:

  1. Generate a batch of object IDs
  2. Copy all records across all property tables corresponding to these object IDs
  3. Terminate DataIterators if no more object IDs remain in shard

I don't believe this is possible without opening a PR against ghostferry, but let me know if I'm missing something. If it isn't possible, do you have any suggestions on how to implement this?

I'm thinking I'll need to have the DataIterator understand these object ID batches and do multiple dataIterator.Run()s for each batch before exiting. I see a few options here:

  • A: Define an interface on DataIterator that supports different termination conditions
  • B: Put the current DataIterator implementation behind an interface so Ferry can support an alternate implementation

Any guidance here would be appreciated - thanks for your time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant