Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve physical row order during deduplication from incremental sync #100

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

emilklindt
Copy link

This pull request fixes an issue where the target fails to correctly deduplicate rows that were extracted in succession, due to identical _sdc_received_at and _sdc_extracted_at timestamps. The solution is to preserve physical / stored row order by sorting directly on columns, instead of using the COALESCE function.

Example

When using COALESCE in the ORDER BY, rows with the same values are considered identical, and does not preserve natural sort order descending.

Before Fix (COALESCE used for deduplication)

id _sdc_received_at _sdc_extracted_at Value Selected row
1 2024-10-02T12:00:00.000Z 2024-10-02T12:00:00.000Z Oldest Selected
1 2024-10-02T12:00:00.000Z 2024-10-02T12:00:00.000Z Older Ignored
1 2024-10-02T12:00:00.000Z 2024-10-02T12:00:00.000Z Current Ignored

After Fix (direct sorting on columns)

id _sdc_received_at _sdc_extracted_at Value Selected Row
1 2024-10-02T12:00:00.000Z 2024-10-02T12:00:00.000Z Oldest Ignored
1 2024-10-02T12:00:00.000Z 2024-10-02T12:00:00.000Z Older Ignored
1 2024-10-02T12:00:00.000Z 2024-10-02T12:00:00.000Z Current Selected

Impact Assessment

This is not a breaking change for most sources, as long as they consistently include or exclude the _sdc_extracted_at value. If only some rows have this timestamp, those rows will be prioritized, but this scenario is unlikely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant