Preserve physical row order during deduplication from incremental sync #100
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request fixes an issue where the target fails to correctly deduplicate rows that were extracted in succession, due to identical _sdc_received_at and _sdc_extracted_at timestamps. The solution is to preserve physical / stored row order by sorting directly on columns, instead of using the
COALESCE
function.Example
When using
COALESCE
in theORDER BY
, rows with the same values are considered identical, and does not preserve natural sort order descending.Before Fix (
COALESCE
used for deduplication)Oldest
Older
Current
After Fix (direct sorting on columns)
Oldest
Older
Current
Impact Assessment
This is not a breaking change for most sources, as long as they consistently include or exclude the _sdc_extracted_at value. If only some rows have this timestamp, those rows will be prioritized, but this scenario is unlikely.