-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/UUID timestamp extraction #1070
Conversation
@jgraettinger see here for my explanation of flow/crates/derive/src/combine_api.rs Lines 65 to 70 in 93f8855
|
FYI you can update Go snapshots by running |
@@ -1,3 +1,5 @@ | |||
use std::iter; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: avoid extra space here. Have a single block of use
statements, so that they're automatically sorted by the formatter.
@@ -11,6 +11,7 @@ license.workspace = true | |||
[dependencies] | |||
doc = { path = "../doc" } | |||
json = { path = "../json" } | |||
derive = { path = "../derive" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not thrilled with adding new dependency on derive
. That crate is slatted to be taken out back to the woodshed...
.. | ||
}: &Param, | ||
document: &N, | ||
) -> rusqlite::Result<()> { | ||
use derive::PointerExt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't love the introduced dependence on derive
... that crate is slated to be taken out behind the woodshed.
@@ -7,6 +7,8 @@ use proto_flow::flow; | |||
pub struct Param { | |||
// Projection which is being related as a SQLite parameter. | |||
pub projection: flow::Projection, | |||
// Collection which contains the projection that is being related as a SQLite parameter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the reason for Param
is to provide separation from the details of CollectionSpec
representation and the way we actually use parameters for SQLite.
I'm thinking we need to introduce a is_uuid_timestamp
boolean on flow::Projection
.
Then, there would be projections like field: _meta/uuid/timestamp (or something custom) ptr: /_meta/uuid is_uuid_timestamp: true
That allows resolution of all this stuff from only the flow::Projection
instance, and we don't need to put logic for UUID_PTR+'/timestamp' anywhere except the validation logic that makes a flow::Projection
.
For this crate specifically, derive::PointerExt
is backed out and instead, within the existing :
Some(Node::String(s)) => {
if *is_format_integer {
block, you also test for is_uuid_timestamp
and extract it accordingly.
return None | ||
}; | ||
|
||
match v_uuid.as_node() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 I'm thinking the bulk of this should live in flow-proto
-- perhaps an impl of flow::UuidParts::from_uuid(uuid: uuid::Uuid) -> Self
or even impl From<uuid::Uuid> for flow::UuidParts
etc would definitely make sense there.
@@ -93,13 +104,15 @@ impl cgo::Service for API { | |||
schema_json, | |||
key_ptrs, | |||
field_ptrs, | |||
uuid_placeholder_ptr, | |||
uuid_ptr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extrapolating from my comment above that is_uuid_timestamp
belongs in flow::Projection
, that implies that we need to pass flow::Projection
instances in for key_ptrs
and field_ptrs
instead of raw JSON pointers.
That's a shift, but I do think that's probably the right call. if there are other circumstances that arise where we want to condition extraction behavior then standardizing on the passed-around flow::Projection
greatly simplifies things.
come to think of it, default values are another good example! drain_chunk
uses it's own parsed Shape
below, but the flow::Projection
also bakes in the default value. And there was another circumstance this came up, i think the extract API, where the source schema shape couldn't easily be plumbed through...
Superseded by #1098 |
Description:
Customers were asking for the ability to include a timestamp column in their materializations that represents the point at which Flow saw that piece of data, effectively a "last modified" or "valid as of" field. We actually always had this data encoded as part of each document's UUID, and this PR allows you to get access to it via a "virtual projection" at the location
/_meta/uuid/timestamp
.Workflow steps:
In order to materialize this timestamp, you have to:
/_meta/uuid/timestamp
Unfortunately the UI does not support projections yet, so there is still some CLI work involved. It does technically support field selection by way of the "Advanced Specification Editor".
Given that, the workflow to materialize the meta uuid timestamp is:
flowctl catalog pull-specs
flowctl catalog publish
If everything works, you should that field materialized! Example of materializing a hello world capture to postgres with the meta timestamp projection:
This change is