-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: Align internal Airbyte columns with Dv2; Feat: Auto-add missing columns to Cache tables. #144
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as resolved.
This comment was marked as resolved.
@jbfbell, @evantahler - Cc'ing you here for visibility and to confirm the direction... This PR gets us closer to compatibility with the Dv2 specs, which I've documented / linked to in the related issue: A couple callouts/questions:
An important caveat:
Update: I just decided to go ahead and drop the loaded-at column and I've added an implementation of |
Nice work AJ!
We made a funny choice with Dv2 which tried to balance "give the users lots of data about the sync" with "users seem to be mad when we make too many new columns". That's how we ended up showing
So the long-running raw table are getting even more important now for the refresh work, especially as we are merging records across generations, and to power truncating refreshes without data-towntime. I think they are with us for the long haul now. If none of those words made sense (because we made them up recently), check out this draft of the updated user-facing doc and join
Casse-sensitivity is kind of per-destination now. I'll let @jbfbell take that one, as I don't have the latest info any more. |
@evantahler - Thanks for the review and for this context. I'll drop This PR doesn't touch capitalization rules so I think we're ok taking any needed changes to those in a subsequent pass. @jbfbell - Let me know if you have any other questions/concerns here. I think we're good to go with the addition of This PR doesn't dive into what to do with other special characters, but if you can point me at source code or a Notion/Docs page, I'll take on any other name normalization rules in a subsequent PR. Thanks, both! |
/fix-pr
|
@jbfbell and @bindipankhudi - If one or both of you is available to review/approve tomorrow, I'll go ahead and merge when ready. All tests are passing now. ✅ 😄 |
Resolves: #14
Resolves: #118
This update does a couple things:
_airbyte_emitted_at
- The timestamp corresponding toemitted_at
in the Airbyte Record message.~~- removed per comments below_airbyte_loaded_at
- The timestamp representingutcnow()
at the time the record is received by PyAirbyte._airbyte_meta
- For now, defaults to an empty dictionary._airbyte_raw_id
- A unique ID per record, assigned as it is received from source. (Unlike Dv2, there is no raw table but I kept the same column name for consistency.)As of this iteration, PyAirbyte will not...
...will not attempt to align_airbyte_loaded_at
with the batch load time.As noted above, the time is when we first start processing the record. In the future, we can try to set this during load time, but doing so is fragile and may not be possible in a universal/generic manner.(Removed, so no longer relevant.)_airbyte_meta
with any data.Tests